config.yaml file

The user should write a config.yaml file containing information pertaining to the data products used in the code run. The example config.yaml file below describes a code run with inputs:

  • disease/sars_cov2/SEINRD_model/parameters/static_params
  • disease/sars_cov2/SEINRD_model/parameters/rts
  • disease/sars_cov2/SEINRD_model/parameters/efoi

These inputs are listed in the register block, meaning that they should be downloaded to the local data store from an external source, with associated metadata stored in the local registry. These inputs are automatically converted into a read block by fair run (when data products are already present in the data registry, inputs should be listed in the read block).

A code run usually also has outputs, which are listed in the write block. In the example below, our outputs are:

  • disease/sars_cov2/SEINRD_model/results/model_output
  • disease/sars_cov2/SEINRD_model/results/figure

SEINRDconfig.yaml:

run_metadata:
  default_input_namespace: BioSS
  description: SEINRD_model
  script: R -f inst/extdata/SEINRD.R --args ${{CONFIG_DIR}}

register:
- namespace: BioSS
  full_name: Biomathematics and Statistics Scotland
  website: https://ror.org/03jwrz939

- external_object: disease/sars_cov2/SEINRD_model/parameters/static_params
  namespace_name: BioSS
  root: https://raw.githubusercontent.com/
  path: FAIRDataPipeline/rSimpleModel/main/inst/extdata/static_params_SEInRD.csv
  title: Static parameters of the model
  description: Static parameters of the model
  unique_name: Simple model parameters - Static parameters of the model
  alternate_identifier_type: simple_model_params
  file_type: csv
  release_date: 2022-01-28T12:00
  version: 1.0.0
  primary: True

- external_object: disease/sars_cov2/SEINRD_model/parameters/rts
  namespace: BioSS
  root: https://raw.githubusercontent.com/
  path: FAIRDataPipeline/rSimpleModel/main/inst/extdata/Rt_beep.csv
  title: Values of Rt at time t
  description: Values of Rt at time t
  unique_name: Simple model parameters - Values of Rt at time t
  alternate_identifier_type: simple_model_params
  file_type: csv
  release_date: 2022-01-28T12:00
  version: 1.0.0
  primary: True

- external_object: disease/sars_cov2/SEINRD_model/parameters/efoi
  namespace: BioSS
  root: https://raw.githubusercontent.com/
  path: FAIRDataPipeline/rSimpleModel/main/inst/extdata/efoi_all_dates.csv
  title: External force of infection at time t
  description: External force of infection at time t
  unique_name: Simple model parameters - External force of infection
  alternate_identifier_type: simple_model_params
  file_type: csv
  release_date: 2022-01-28T12:00
  version: 1.0.0
  primary: True

write:
- data_product: disease/sars_cov2/SEINRD_model/results/model_output
  description: SEINRD model results
  file_type: csv

- data_product: disease/sars_cov2/SEINRD_model/results/figure
  description: SEINRD output plot
  file_type: pdf

The submission script should call initialise() to set up the code run, then perhaps read in some data using one of the read_*() functions (for internal file formats) or link_read() (for external file formats such as csvs). The data might now be processed in some way, or a model / analysis might bw carried out, after which the results should be saved in the local data store via one of the write_*() functions or link_write(). When the code run is complete, finalise() should be called to register the all metadata with the local registry.

fair pull

Using the CLI tool, fair pull identifies any data products listed in the register field of the config.yaml. These data products are downloaded to the local data store whilst associated metadata is registered in the local registry.

fair init --ci
fair pull inst/extdata/SEINRDconfig.yaml
#> FAIR repository is already initialised.
#> Updating registry from inst/extdata/SEINRDconfig.yaml
#> WARNING:FAIRDataPipeline.ConfigYAML:Remote registry pulls are not yet implemented
#> WARNING:FAIRDataPipeline.ConfigYAML:Remote registry pulls are not yet implemented

The local registry should now contain three data products:

  1. disease/sars_cov2/SEINRD_model/parameters/static_params,
  2. disease/sars_cov2/SEINRD_model/parameters/rts, and
  3. disease/sars_cov2/SEINRD_model/parameters/efoi.

fair run

Again using the CLI tool, fair run performs the code run, as written in the. submission script. In preparation for this, it translates the user-written config.yaml file for use by the Data Pipeline API. Any variables / wildcards specified by the user in the config file are cross referenced with the registry, and any data products registered by fair pull are made available to read by the current code run.

fair run inst/extdata/SEINRDconfig.yaml
#> Updating registry from inst/extdata/SEINRDconfig.yaml
#> 
#> R version 4.2.0 (2022-04-22) -- "Vigorous Calisthenics"
#> Copyright (C) 2022 The R Foundation for Statistical Computing
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> 
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#> 
#>   Natural language support but running in an English locale
#> 
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#> 
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.
#> 
#> > library(rSimpleModel)
#> > library(rDataPipeline)
#> > library(deSolve)
#> > library(ggplot2)
#> > 
#> > # Read config directory from command line
#> > conf.dir <- commandArgs(trailingOnly=TRUE)[1]
#> > 
#> > # Initialise code run
#> > config <- file.path(conf.dir, "config.yaml")
#> > script <- file.path(conf.dir, "script.sh")
#> > handle <- initialise(config, script)
#> ℹ Reading config.yaml from data store
#> ✔ Writing /home/runner/work/rSimpleModel/rSimpleModel/data_store/jobs/2022-06-21_14_30_24_873290//config.yaml to local registry
#> ✔ Writing /home/runner/work/rSimpleModel/rSimpleModel/data_store/jobs/2022-06-21_14_30_24_873290//script.sh to local registry
#> ✔ Writing FAIRDataPipeline/rSimpleModel to local registry
#> ✔ Writing new code_run to local registry
#> > 
#> > # Read code run inputs
#> > static_params <- read.csv(link_read(handle, "disease/sars_cov2/SEINRD_model/parameters/static_params"))
#> ℹ Locating 'disease/sars_cov2/SEINRD_model/parameters/static_params'
#> > rts_params <- read.csv(link_read(handle, "disease/sars_cov2/SEINRD_model/parameters/rts"))
#> ℹ Locating 'disease/sars_cov2/SEINRD_model/parameters/rts'
#> > efoi_params <- read.csv(link_read(handle, "disease/sars_cov2/SEINRD_model/parameters/efoi"))
#> ℹ Locating 'disease/sars_cov2/SEINRD_model/parameters/efoi'
#> > 
#> > # Run the model
#> > data <- initialise_SEINRD(rts_params, efoi_params, static_params)
#> > results <- ode(y = data$init_state,
#> +                times = data$time_length,
#> +                func = rSimpleModel::SEINRD_model,
#> +                parms = data$pars)
#> > g <- plot_SEINRD(results)
#> > 
#> > # Save outputs to data store
#> > path <- link_write(handle, "disease/sars_cov2/SEINRD_model/results/model_output")
#> > write.csv(results, path)
#> > 
#> > path <- link_write(handle, "disease/sars_cov2/SEINRD_model/results/figure")
#> > ggsave(path, g, width=20, height=5, units="cm", dpi=600)
#> > 
#> > # Register code run in local registry
#> > finalise(handle)
#> ✔ Writing 'disease/sars_cov2/SEINRD_model/results/model_output' to local registry
#> ✔ Writing 'disease/sars_cov2/SEINRD_model/results/figure' to local registry
#> -> PATCH /api/code_run/1/ HTTP/1.1
#> -> Host: 127.0.0.1:8000
#> -> User-Agent: libcurl/7.68.0 r-curl/4.3.2 httr/1.4.3
#> -> Accept-Encoding: deflate, gzip, br
#> -> Accept: application/json, text/xml, application/xml, */*
#> -> Content-Type: application/json
#> -> Authorization: token d946655533485fed81ce0d9710815a6441f93adb
#> -> Content-Length: 304
#> -> 
#> >> {
#> >>   "inputs": [
#> >>     "http://127.0.0.1:8000/api/object_component/1/",
#> >>     "http://127.0.0.1:8000/api/object_component/2/",
#> >>     "http://127.0.0.1:8000/api/object_component/3/"
#> >>   ],
#> >>   "outputs": [
#> >>     "http://127.0.0.1:8000/api/object_component/7/",
#> >>     "http://127.0.0.1:8000/api/object_component/8/"
#> >>   ]
#> >> }
#> 
#> <- HTTP/1.1 200 OK
#> <- Date: Tue, 21 Jun 2022 14:30:36 GMT
#> <- Server: WSGIServer/0.2 CPython/3.9.13
#> <- Content-Type: application/json
#> <- Vary: Accept, Cookie
#> <- Allow: GET, PUT, PATCH, DELETE, HEAD, OPTIONS
#> <- X-Frame-Options: DENY
#> <- Content-Length: 675
#> <- X-Content-Type-Options: nosniff
#> <- Referrer-Policy: same-origin
#> <- 
#> No encoding supplied: defaulting to UTF-8.
#> >

Outputs

Provenance report