Model

Link to code

The Model metadata schema is an extension of the Processing schema tailored to model weights and other data and code artifacts underlying machine learning models - these may be trained on one dataset and evaluated on others, and may be intended to undergo further training iteratively in future versions.

Thus new evaluations and training steps can easily be appended for new model versions. This metadata should be documented for any models that see widespread internal use or public release, in order to facilitate model reuse and document provenance.

Core file

Model

Description of a machine learning model including architecture, training, and evaluation details

Field

Type

Description

name

str

version

str

example_run_code

Code

Code to run the model, possibly including example parameters/data

architecture

ModelArchitecture

Model architecture / type of model

software_framework

Optional[Software]

architecture_parameters

dict

Parameters of model architecture, such as input signature or number of layers.

intended_use

str

Semantic description of intended use

limitations

Optional[str]

training

List[ModelTraining or ModelPretraining]

evaluations

List[ModelEvaluation]

notes

Optional[str]

Model definitions

ModelEvaluation

Description of model evaluation

Field

Type

Description

process_type

ProcessName

performance

List[PerformanceMetric]

name

str

(‘Unique name of the processing step.’, ‘ If not provided, the type will be used as the name.’)

stage

ProcessStage

code

Code

Code used for processing

experimenters

List[Person]

People responsible for processing

pipeline_name

Optional[str]

Pipeline names must exist in Processing.pipelines

start_date_time

datetime (timezone-aware)

end_date_time

datetime (timezone-aware)

output_path

Optional[AssetPath]

Path to processing outputs, if stored.

output_parameters

dict

Output parameters

notes

Optional[str]

resources

Optional[ResourceUsage]

ModelPretraining

Description of model pretraining

Field

Type

Description

source_url

str

URL for pretrained weights

ModelTraining

Description of model training

Field

Type

Description

process_type

ProcessName

train_performance

List[PerformanceMetric]

Performance on training set

test_performance

Optional[List[PerformanceMetric]]

Performance on test data, evaluated during training

test_evaluation_method

Optional[str]

Approach to cross-validation or Train/test splitting

name

str

(‘Unique name of the processing step.’, ‘ If not provided, the type will be used as the name.’)

stage

ProcessStage

code

Code

Code used for processing

experimenters

List[Person]

People responsible for processing

pipeline_name

Optional[str]

Pipeline names must exist in Processing.pipelines

start_date_time

datetime (timezone-aware)

end_date_time

datetime (timezone-aware)

output_path

Optional[AssetPath]

Path to processing outputs, if stored.

output_parameters

dict

Output parameters

notes

Optional[str]

resources

Optional[ResourceUsage]

PerformanceMetric

Description of a performance metric

Field

Type

Description

name

str

value

typing.Any