Quality control

Link to code

Quality control is a collection of metrics evaluated on a data asset.

QCMetric objects should be generated during pipelines: from raw data, during processing, and during analysis by researchers.

Every QCMetric has a aind_data_schema.quality_control.State which takes the value of the metric and compares it to some rule. Metrics can only pass or fail. Metrics that require manual evaluation are set to pending.

Details

Metrics

Each QCMetric is a single value or array of values that can be computed, or observed, about one modality in a data asset. These can have any type. Metrics should be significant: i.e. whether they pass or fail should matter for the modality. Metrics need to be human understandable. If you find yourself generating more than fifty metrics for a modality you should group them together (i.e. make the value a dictionary combining similar metrics and the rule an evaluation of multiple fields in the dictionary).

Each QCMetric has a Status. The Status should depend directly on the QCMetric.value, either by a simple function: “value>5”, or by a qualitative rule: “Field of view includes visual areas”. The QCMetric.description field should describe the rule used to set the status. Metrics can be evaluated multiple times, in which case the new status should be appended the QCMetric.status_history.

Each QCMetric is annotated with three pieces of additional metadata: the Stage during which it was evaluated, the Modality of the evaluated data, and tags.

Curations

If you find yourself computing a value for something smaller than an entire modality of data in an asset you are performing curation, i.e. you are determining the status of a subset of a modality in the data asset. We provide the CurationMetric for this purpose. You should put a dictionary in the CurationMetric.value field that contains a mapping between the subsets (usually neurons, ROIs, channels, etc) and their values.

Tags

tags are any string that naturally groups sets of metrics together. Good tags are things like: “Probe A”, “Motion correction”, and “Pose tracking”. The stage and modality are automatically treated as tags, you do not need to include them in the tags list.

QualityControl.evaluate_status()

You can evaluate the state of a set of metrics filtered by any combination of modalities, stages, and tags on a specific date (by default, today). When evaluating the Status of a group of metrics the following rules apply:

First, any metric that is failing and also has a matching tag (or tuple of tags) in the QualityControl.allow_tag_failures list is set to pass. This allows you to specify that certain metrics are not critical to a data asset.

Then, given the status of all the metrics in the group:

  1. If any metric is still failing, the evaluation fails

  2. If any metric is pending and the rest pass the evaluation is pending

  3. If all metrics pass the evaluation passes

Q: What is a metric reference?

Each QCMetric should include a QCMetric.reference. References should be publicly accessible images, figures, multi-panel figures, and videos that support the metric value/status or provide the information necessary for manual annotation.

It’s good practice to share a single multi-panel figure across multiple references to simplify viewing the quality control.

Q: What are the status options for metrics?

In our quality control a metric’s status is always PASS, PENDING (waiting for manual annotation), or FAIL.

We enforce this minimal set of states to prevent ambiguity and make it easier to build tools that can interpret the status of a data asset.

Multi-asset QC

During analysis there are many situations where multiple data assets need to be pulled together, often for comparison. For example, FOVs across imaging sessions or recording sessions from a chronic probe might need to get matched up across days. When a QCMetric is being calculated from multiple assets it should be tagged with Stage:MULTI_ASSET and each of its metrics needs to track the assets that were used to generate that metric in the evaluated_assets list.

Example

  1"""Example quality control processing"""
  2
  3from datetime import datetime, timezone
  4
  5from aind_data_schema_models.modalities import Modality
  6
  7from aind_data_schema.core.quality_control import QualityControl, QCMetric, Stage, Status, QCStatus
  8
  9t = datetime(2022, 11, 22, 0, 0, 0, tzinfo=timezone.utc)
 10
 11s = QCStatus(evaluator="Automated", status=Status.PASS, timestamp=t)
 12sp = QCStatus(evaluator="", status=Status.PENDING, timestamp=t)
 13
 14# Example of how to use a dictionary to provide options for a metric in the QC portal
 15drift_value_with_options = {
 16    "value": "",
 17    "options": ["Low", "Medium", "High"],
 18    "status": [
 19        "Pass",
 20        "Fail",
 21        "Fail",
 22    ],  # when set, this field will be used to automatically parse the status, blank forces manual update
 23    "type": "dropdown",
 24}
 25
 26# Example of how to use a dictionary to provide multiple checkable flags, some of which will fail the metric
 27drift_value_with_flags = {
 28    "value": "",
 29    "options": [
 30        "No Drift",
 31        "Drift visible in part of acquisition",
 32        "Drift visible in entire acquisition",
 33        "Sudden movement event",
 34    ],
 35    "status": ["Pass", "Pass", "Fail", "Fail"],
 36    "type": "checkbox",
 37}
 38
 39metrics = [
 40    QCMetric(
 41        name="Probe A drift",
 42        modality=Modality.ECEPHYS,
 43        stage=Stage.RAW,
 44        description="Pass when drift map shows minimal movement",
 45        value=drift_value_with_options,
 46        reference="ecephys-drift-map",
 47        status_history=[sp],
 48        tags=["Drift map", "Probe A"],
 49    ),
 50    QCMetric(
 51        name="Probe B drift",
 52        modality=Modality.ECEPHYS,
 53        stage=Stage.RAW,
 54        description="Pass when drift map shows minimal movement",
 55        value=drift_value_with_flags,
 56        reference="ecephys-drift-map",
 57        status_history=[sp],
 58        tags=["Drift map", "Probe B"],
 59    ),
 60    QCMetric(
 61        name="Probe C drift",
 62        modality=Modality.ECEPHYS,
 63        stage=Stage.RAW,
 64        description="Pass when drift map shows minimal movement",
 65        value="Low",
 66        reference="ecephys-drift-map",
 67        status_history=[s],
 68        tags=["Drift map", "Probe C"],
 69    ),
 70    QCMetric(
 71        name="Expected frame count",
 72        modality=Modality.BEHAVIOR_VIDEOS,
 73        stage=Stage.RAW,
 74        description="Expected frame count from experiment length, always pass",
 75        value=662,
 76        status_history=[s],
 77        tags=["Frame count checks"],
 78    ),
 79    QCMetric(
 80        name="Video 1 frame count",
 81        modality=Modality.BEHAVIOR_VIDEOS,
 82        stage=Stage.RAW,
 83        description="Pass when frame count matches expected",
 84        value=662,
 85        status_history=[s],
 86        tags=["Frame count checks", "Video 1"],
 87    ),
 88    QCMetric(
 89        name="Video 2 num frames",
 90        modality=Modality.BEHAVIOR_VIDEOS,
 91        stage=Stage.RAW,
 92        description="Pass when frame count matches expected",
 93        value=662,
 94        status_history=[s],
 95        tags=["Frame count checks", "Video 2"],
 96    ),
 97    QCMetric(
 98        name="ProbeA",
 99        modality=Modality.ECEPHYS,
100        stage=Stage.RAW,
101        description="Pass when probe is present in the recording",
102        value=True,
103        status_history=[s],
104        tags=["Probes present"],
105    ),
106    QCMetric(
107        name="ProbeB",
108        modality=Modality.ECEPHYS,
109        stage=Stage.RAW,
110        description="Pass when probe is present in the recording",
111        value=True,
112        status_history=[s],
113        tags=["Probes present"],
114    ),
115    QCMetric(
116        name="ProbeC",
117        modality=Modality.ECEPHYS,
118        stage=Stage.RAW,
119        description="Pass when probe is present in the recording",
120        value=True,
121        status_history=[s],
122        tags=["Probes present"],
123    ),
124]
125
126q = QualityControl(
127    metrics=metrics,
128    default_grouping=["Drift map", "Frame count checks", "Probes present"],
129    allow_tag_failures=["Video 2"],  # this will allow the Video 2 metric to fail without failing the entire QC
130)
131
132if __name__ == "__main__":
133    serialized = q.model_dump_json()
134    deserialized = QualityControl.model_validate_json(serialized)
135    q.write_standard_file()

Core file

QualityControl

Collection of quality control metrics evaluated on a data asset to determine pass/fail status

Field

Type

Description

metrics

List[QCMetric or CurationMetric]

key_experimenters

Optional[List[Person]]

Experimenters who are responsible for quality control of this data asset

notes

Optional[str]

default_grouping

List[str]

Default tag grouping for this QualityControl object, used in visualizations

allow_tag_failures

List[str or tuple]

List of tags that are allowed to fail without failing the overall QC

status

Optional[dict]

Mapping of tags, modalities, and stages to their evaluated status, automatically computed

Model definitions

CurationHistory

Schema to track curator name and timestamp for curation events

Field

Type

Description

curator

Person

timestamp

datetime (timezone-aware)

CurationMetric

Description of a curation metric

Field

Type

Description

value

List[typing.Any]

type

str

curation_history

List[CurationHistory]

name

str

modality

Modality

stage

Stage

status_history

List[QCStatus]

description

Optional[str]

reference

Optional[str]

tags

List[str]

Tags group QCMetric objects to allow for grouping and filtering

evaluated_assets

Optional[List[str]]

Set to None except when a metric’s calculation required data coming from a different data asset.

QCMetric

Description of a single quality control metric

Field

Type

Description

name

str

modality

Modality

stage

Stage

value

typing.Any

status_history

List[QCStatus]

description

Optional[str]

reference

Optional[str]

tags

List[str]

Tags group QCMetric objects to allow for grouping and filtering

evaluated_assets

Optional[List[str]]

Set to None except when a metric’s calculation required data coming from a different data asset.

QCStatus

Description of a QC status, set by an evaluator

Field

Type

Description

evaluator

str

status

Status

timestamp

datetime (timezone-aware)

Stage

Quality control stage

When during data processing the QC metrics were derived.

Name

Value

RAW

Raw data

PROCESSING

Processing

ANALYSIS

Analysis

MULTI_ASSET

Multi-asset

Status

QC Status

Name

Value

FAIL

Fail

PASS

Pass

PENDING

Pending