Welcome to aind-data-schema¶

Code repository

Data acquired at the Allen Institute for Neural Dynamics (AIND) is accompanied by metadata describing how it was acquired, processed, and analyzed. This metadata is stored in JSON files according to the schema defined in this library. Our goal in capturing this metadata is to make our data findable and understandable.

Explore the schema by going into core files on the left sidebar, or through this interactive diagram.

Data assets acquired from a live subject or in vivo specimen must contain the following core metadata files:

data_description: Administrative metadata about the source of the data, funding, relevant licenses, and restrictions on use.
subject: Species, genotype, age, sex, and source.
procedures: Metadata about any procedures performed prior to data acquisition, including subject procedures (surgeries, behavior training, etc.) and specimen procedures (tissue preparation, staining, etc.).
instrument: Metadata describing the equipment used to acquire data, including part names, serial numbers.
acquisition: Metadata describing what devices were active during acquisition and their configuration.

After data analysis, additional processing and quality control metadata is captured:

processing: Metadata describing how data has been processed and analyzed into derived data assets, including information on the software and parameters used.
quality_control: Evaluations and metrics describing the quality of a data asset.
model: Metadata describing machine learning models created from or used to analyze data assets.

Finally the core files are pulled together into a single metadata.json file:

metadata: The combined set of core files, plus the asset location (e.g. on S3).

The core files are built from many smaller schema objects. These are stored in the components and registries. Registries are specifically used for schema objects that are part of a controlled vocabulary. Some registries are linked to external standards.

components: Component schemas used to build up the core files (devices, configurations, etc).
registries: Component schemas that are part of a controlled vocabulary.

Welcome to aind-data-schema¶

I want to…¶