Data description

Link to code

The data_description.json file tracks administrative information about a data asset, including affiliated researchers/organizations, projects, data modalities, dates of collection, and more.

Uniqueness

Every data asset is uniquely identified by its DataDescription.name field, which combines the subject_id and acquisition session_end_time. You can group data assets together using the DataDescription.tags: List[str]. Tags should be shared across assets within experiments. Do not repeat information in the tags that already exists elsewhere in the metadata, for example modalities should never be included in tags.

Example

 1"""example data description"""
 2
 3import argparse
 4from datetime import datetime, timezone
 5
 6from aind_data_schema_models.modalities import Modality
 7from aind_data_schema_models.organizations import Organization
 8from aind_data_schema_models.data_name_patterns import DataLevel
 9
10from aind_data_schema.core.data_description import Funding, DataDescription
11from aind_data_schema.components.identifiers import Person
12
13d = DataDescription(
14    modalities=[Modality.ECEPHYS, Modality.BEHAVIOR_VIDEOS],
15    subject_id="123456",
16    creation_time=datetime(2022, 2, 21, 16, 30, 1, tzinfo=timezone.utc),
17    institution=Organization.AIND,
18    investigators=[Person(name="Daniel Birman", registry_identifier="0000-0003-3748-6289")],
19    funding_source=[Funding(funder=Organization.AI)],
20    project_name="Example project",
21    data_level=DataLevel.RAW,
22    tags=["Pilot data"],
23)
24
25if __name__ == "__main__":
26    parser = argparse.ArgumentParser()
27    parser.add_argument("--output-dir", default=None, help="Output directory for generated JSON file")
28    args = parser.parse_args()
29
30    serialized = d.model_dump_json()
31    deserialized = DataDescription.model_validate_json(serialized)
32    deserialized.write_standard_file(output_directory=args.output_dir)

Core file

DataDescription

Description of a logical collection of data files

Field

Type

Title (Description)

license

License

License

subject_id

Optional[str]

Subject ID (Unique identifier for the subject of data acquisition)

creation_time

datetime (timezone-aware)

Creation Time (Time that data files were created, used to uniquely identify the data)

tags

Optional[List[str]]

Tags (Descriptive strings to help categorize and search for data)

name

Optional[str]

Data asset name (When left blank, a name will be generated based on subject_id and creation_time. Conventionally also used as the name of the data folder.)

institution

Organization

Institution (An established society, corporation, foundation or other organization that collected this data)

funding_source

List[Funding]

Funding source (Funding source. If internal funding, select ‘Allen Institute’)

data_level

DataLevel

Data Level (Level of processing that data has undergone)

group

Optional[Group]

Group (A short name for the group of individuals that collected this data)

investigators

List[Person]

Investigators (Full name(s) of key investigators (e.g. PI, lead scientist, contact person))

project_name

str

Project Name (A name for a set of coordinated activities intended to achieve one or more objectives.)

restrictions

Optional[str]

Restrictions (Detail any restrictions on publishing or sharing these data)

modalities

List[Modality]

Modalities (A short name for the specific manner, characteristic, pattern of application, or the employment of any technology or formal procedure to generate data for a study)

source_data

Optional[List[str]]

Source data (For derived assets, list the source data asset names used to create this data)

data_summary

Optional[str]

Data summary (Semantic summary of experimental goal)

Model definitions

Funding

Description of funding sources

Field

Type

Title (Description)

funder

Organization

Funder

grant_number

Optional[str]

Grant number

fundee

Optional[List[Person]]

Fundee (Person(s) funded by this mechanism)