Data description

Link to code

The data_description.json file tracks administrative information about a data asset, including affiliated researchers/organizations, projects, data modalities, dates of collection, and more.

Uniqueness

Every data asset is uniquely identified by its DataDescription.name field, which combines the subject_id and acquisition session_end_time. You can group data assets together using the DataDescription.tags: List[str]. Tags should be shared across assets within experiments. Do not repeat information in the tags that already exists elsewhere in the metadata, for example modalities should never be included in tags.

Example

 1""" example data description """
 2
 3from datetime import datetime, timezone
 4
 5from aind_data_schema_models.modalities import Modality
 6from aind_data_schema_models.organizations import Organization
 7
 8from aind_data_schema.components.identifiers import Person
 9from aind_data_schema.core.data_description import Funding, DataDescription
10
11d = DataDescription(
12    modalities=[Modality.ECEPHYS, Modality.BEHAVIOR_VIDEOS],
13    subject_id="12345",
14    creation_time=datetime(2022, 2, 21, 16, 30, 1, tzinfo=timezone.utc),
15    institution=Organization.AIND,
16    investigators=[Person(name="Jane Smith")],
17    funding_source=[Funding(funder=Organization.AI)],
18    project_name="Example project",
19    data_level="raw",
20)
21
22if __name__ == "__main__":
23    serialized = d.model_dump_json()
24    deserialized = DataDescription.model_validate_json(serialized)
25    deserialized.write_standard_file()

Core file

DataDescription

Description of a logical collection of data files

Field

Type

Description

license

License

subject_id

Optional[str]

Unique identifier for the subject of data acquisition

creation_time

datetime (timezone-aware)

Time that data files were created, used to uniquely identify the data

tags

Optional[List[str]]

Descriptive strings to help categorize and search for data

name

Optional[str]

Name of data, conventionally also the name of the directory containing all data and metadata

institution

Organization

An established society, corporation, foundation or other organization that collected this data

funding_source

List[Funding]

Funding source. If internal funding, select ‘Allen Institute’

data_level

DataLevel

Level of processing that data has undergone

group

Optional[Group]

A short name for the group of individuals that collected this data

investigators

List[Person]

Full name(s) of key investigators (e.g. PI, lead scientist, contact person)

project_name

str

A name for a set of coordinated activities intended to achieve one or more objectives.

restrictions

Optional[str]

Detail any restrictions on publishing or sharing these data

modalities

List[Modality]

A short name for the specific manner, characteristic, pattern of application, or the employment of any technology or formal procedure to generate data for a study

data_summary

Optional[str]

Semantic summary of experimental goal

Model definitions

Funding

Description of funding sources

Field

Type

Description

funder

Organization

grant_number

Optional[str]

fundee

Optional[List[Person]]

Person(s) funded by this mechanism