Data description¶
The data_description.json file tracks administrative information about a data asset, including affiliated researchers/organizations, projects, data modalities, dates of collection, and more.
Uniqueness¶
Every data asset is uniquely identified by its DataDescription.name field, which combines the subject_id and acquisition session_end_time. You can group data assets together using the DataDescription.tags: List[str]. Tags should be shared across assets within experiments. Do not repeat information in the tags that already exists elsewhere in the metadata, for example modalities should never be included in tags.
Example¶
1""" example data description """
2
3from datetime import datetime, timezone
4
5from aind_data_schema_models.modalities import Modality
6from aind_data_schema_models.organizations import Organization
7
8from aind_data_schema.components.identifiers import Person
9from aind_data_schema.core.data_description import Funding, DataDescription
10
11d = DataDescription(
12 modalities=[Modality.ECEPHYS, Modality.BEHAVIOR_VIDEOS],
13 subject_id="12345",
14 creation_time=datetime(2022, 2, 21, 16, 30, 1, tzinfo=timezone.utc),
15 institution=Organization.AIND,
16 investigators=[Person(name="Jane Smith")],
17 funding_source=[Funding(funder=Organization.AI)],
18 project_name="Example project",
19 data_level="raw",
20)
21
22if __name__ == "__main__":
23 serialized = d.model_dump_json()
24 deserialized = DataDescription.model_validate_json(serialized)
25 deserialized.write_standard_file()
Core file¶
DataDescription¶
Description of a logical collection of data files
Field |
Type |
Description |
|---|---|---|
|
||
|
|
Unique identifier for the subject of data acquisition |
|
|
Time that data files were created, used to uniquely identify the data |
|
|
Descriptive strings to help categorize and search for data |
|
|
Name of data, conventionally also the name of the directory containing all data and metadata |
|
An established society, corporation, foundation or other organization that collected this data |
|
|
List[Funding] |
Funding source. If internal funding, select ‘Allen Institute’ |
|
Level of processing that data has undergone |
|
|
Optional[Group] |
A short name for the group of individuals that collected this data |
|
List[Person] |
Full name(s) of key investigators (e.g. PI, lead scientist, contact person) |
|
|
A name for a set of coordinated activities intended to achieve one or more objectives. |
|
|
Detail any restrictions on publishing or sharing these data |
|
List[Modality] |
A short name for the specific manner, characteristic, pattern of application, or the employment of any technology or formal procedure to generate data for a study |
|
|
Semantic summary of experimental goal |
Model definitions¶
Funding¶
Description of funding sources
Field |
Type |
Description |
|---|---|---|
|
||
|
|
|
|
Optional[List[Person]] |
Person(s) funded by this mechanism |