Schema Description

The FAIR data registry schema has been designed to address the different requirements around the information needed on different research objects associated with an epidemiological pipeline.

Main entities #

The main entities represented in the schema are around the data and the software for epidemiological models:

  • DataProduct: A dataset that is used by or is generated by a model.
  • ExternalObject: An external data object, i.e. one that has comes from somewhere other than being generated as part of the modelling pipeline.
  • CodeRun: A code run along with its associated, code repo, configuration, input and outputs.
  • CodeRepoRelease: Information marking that an Object is an official release of a model code.
  • Object: Core traceability object used to represent any FAIR data object such DataProduct, CodeRepoRelease, CodeRun.

Data representation #

For the data representation, the schema also supports categorising datasets by using:

  • Namespace: A namespace is a way to group DataProducts following some specific categorisation criteria, such as what is the organisation producing the data product.

data products

Models representation #

code runs

FAIR research objects #

The schema also represents FAIR objects, which are associated with DataProducts, CodeRuns and so on.

FAIR object

The Object is associated with:

  • Licence: Licence that can be associated with an Object in case the code or data source has a specific licence that needs to be recorded.
  • StorageLocation: The location of an item relative to a StorageRoot.
  • StorageRoot: The root location of a storage cache where model files are stored.
  • QualityControlled: Marks that the associated Object has been quality controlled.
  • Keyword: Keywords that can be associated with an Object usually for use with ExternalObjects to record paper keywords, and so on.

The object is also associated with entities related to users and authors:

  • Author: Authors that can be associated with an Object usually for use with ExternalObjects to record paper authors, etc.
  • UserAuthor: A combination of an Author associated with a particular user.

Internal provenance #

Finally, the schema also includes some entities that are used internally:

  • BaseModel: Base model for all objects in the database. Used to defined common fields and functionality to keep internal provenance information such as when it was last updated and by whom.

Schema Diagram #

The whole schema diagram can be accessed on the remote data registry. We are also including the image here, which you can open in a new tab to see it full scale: