Provenance Report

Provenance Report #

In order to address the use case around being able to track the evidence to understand the reported results, the data registry has the capability to produce provenance reports for each of the data products.

Provenance is the documented history of processes in a digital object’s lifecycle.

The provenance reports generated by the registry are based around the concepts of activities, agents and entities. For more information about these concepts see the PROV Ontology or PROV-O.

Provenance reports are only available for DataProducts and can be accessed via the RESTful API for example:

https://data.fairdatapipeline.org/api/prov-report/3/

Query parameters #

  • format

    • api: a html representation of the report with media type of text/html
    • json: a json representation of the report with media type of application/json
    • json-ld: a json-ld representation of the report with media type of application/ld+json
    • jpg: an image representing the provenance with media type of image/jpeg
    • svg: an interactive image representing the provenance with media type of image/svg+xml
    • xml: an XML representation of the report with media type of text/xml
    • provn: a PROV-N representation of the report with media type of text/provenance-notation
  • aspect_ratio

    • <float>: a float used to define the ratio for the JPEG and SVG images. The default is 0.71, which is equivalent to A4 landscape.
  • attributes

    • True (default): show the attributes associated with an object on the image
    • False: hide the attributes associated with an object on the image
  • dpi

    • <float>: a float used to define the dpi for the JPEG and SVG images
  • depth

    • <integer>: an integer used to determine how many levels of code runs to include, the default is 1

Prefixes #

All activities, agents and entities have a URI. Prefixes are used to represent the base component of these URIs. Two different prefixes are used, reg and lreg, where reg is used as the prefix for the central registry and lreg is used to refer to a local registry. Hence dependent on the location of the object you may see:

reg:api/data_product/1

or

lreg:api/data_product/1

Using provn as an example you may see a section similar to:

  prefix lreg <http://192.168.20.10:8000/>
  prefix fair <https://data.fairdatapipeline.org/vocab/#>
  prefix dcat <http://www.w3.org/ns/dcat#>
  prefix dcmitype <http://purl.org/dc/dcmitype/>
  prefix dcterms <http://purl.org/dc/terms/>
  prefix foaf <http://xmlns.com/foaf/spec/#>

Examples #

Basic Example #

For a simple case the report will contain two entities, a DataProduct and an ExternalObject, where the DataProduct is a specializationOf an ExternalObject.

An example of a basic provenance diagram #

Basic provenance diagram

And this is an example of the XML that is produced:

<prov:document xmlns:lreg="http://192.168.20.10:8000/" xmlns:fair="https://data.fairdatapipeline.org/vocab/#" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/spec/#" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <prov:entity prov:id="lreg:api/data_product/1">
        <prov:type xsi:type="xsd:QName">dcat:Dataset</prov:type>
        <dcat:hasVersion>0.20210915.0</dcat:hasVersion>
        <dcterms:description>Static parameters of the model</dcterms:description>
        <dcterms:format>Comma-Separated Values File</dcterms:format>
        <dcterms:modified xsi:type="xsd:dateTime">2021-09-15T14:16:42.899768+00:00</dcterms:modified>
        <dcterms:title>
            disease/sars_cov2/SEIRS_model/parameters/static_params
        </dcterms:title>
        <fair:namespace>PSU</fair:namespace>
        <prov:atLocation>
            file:///var/folders/0f/fj5r_1ws15x4jzgnm27h_y6h0000gr/T/tmpukqzlyig/data_store//PSU/disease/sars_cov2/SEIRS_model/parameters/static_params/0.20210915.0.csv
        </prov:atLocation>
    </prov:entity>

    <prov:person prov:id="lreg:api/author/1">
        <foaf:name>Interface Test</foaf:name>
    </prov:person>

    <prov:wasAttributedTo>
        <prov:entity prov:ref="lreg:api/data_product/1"/>
        <prov:agent prov:ref="lreg:api/author/1"/>
        <prov:role xsi:type="xsd:QName">dcterms:creator</prov:role>
    </prov:wasAttributedTo>
    
    <prov:entity prov:id="lreg:api/external_object/1">
        <prov:type xsi:type="xsd:QName">dcat:Dataset</prov:type>
        <dcat:hasVersion>0.20210915.0</dcat:hasVersion>
        <dcterms:issued xsi:type="xsd:dateTime">2021-09-15T15:16:42+00:00</dcterms:issued>
        <dcterms:title>Static parameters of the model</dcterms:title>
        <fair:alternate_identifier>
            SEIRS model parameters - Static parameters of the model
        </fair:alternate_identifier>
        <fair:alternate_identifier_type>SEIRS_model_params</fair:alternate_identifier_type>
    </prov:entity>

    <prov:specializationOf>
        <prov:specificEntity prov:ref="lreg:api/external_object/1"/>
        <prov:generalEntity prov:ref="lreg:api/data_product/1"/>
    </prov:specializationOf>

</prov:document>

A Data Product Generated from a Code Run #

In a complete example a DataProduct entity would have a relationship of wasGeneratedBy with a CodeRun activity, it would have a relationship of wasAttributedTo with an Author agent and it would have a relationship of wasDerivedFrom one or more DataProducts entities.

In turn the CodeRun would have a relationship of wasStartedBy with an Author, it would have used a model_cofiguration, submission_script, CodeRepoRelease and one or more DataProducts

An example of a provenance diagram #

Provenance diagram example