DPAPI examples

DPAPI examples #

Note that this is a living document and the following is subject to change.

This page gives simple examples of the user written config.yaml file alongside the working config file generated by FAIR run. Note that the Data Pipeline API will take the working config file as an input.

Empty code run #

User written config.yaml #

run_metadata:
  description: An empty code run
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |- 
    R -f simple_working_examples/empty_script.R ${{CONFIG_DIR}}

Working config.yaml #

fair run should create a working config.yaml file, which is read by the Data Pipeline API. In this example, the working config.yaml file is pretty much identical to the original config.yaml file, only ${{CONFIG_DIR}} is replaced by the directory in which the working config.yaml file resides.

run_metadata:
  description: An empty code run
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |- 
    R -f simple_working_examples/empty_script.R /Users/SoniaM/datastore/coderun/20210511-231444/
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation

Submission script (R) #

library(rDataPipeline)

# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

finalise(handle)

Write data product (HDF5) #

User written config.yaml #

run_metadata:
  description: Write an array
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |- 
    R -f simple_working_examples/write_array.R ${{CONFIG_DIR}}

write:
- data_product: test/array
  description: test array with simple data

Working config.yaml #

fdp run should create a working config.yaml file, which is read by the Data Pipeline API. In this example, the working config.yaml file is pretty much identical to the original config.yaml file, only ${{CONFIG_DIR}} is replaced by the directory in which the working config.yaml file resides.

run_metadata:
  description: Write an array
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |- 
    R -f simple_working_examples/write_array.R /Users/SoniaM/datastore/coderun/20210511-231444/
  public: true
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation

write:
- data_product: test/array
  description: test array with simple data
  use:    
    version: 0.1.0

Note that, although use: is reserved for aliasing in the user-written config, for simplicity the CLI will always write version here.

Note also that by default, the CLI will write public: true to run_metadata:. The user is however free to specify public: false for individual writes.

Submission script (R) #

library(rDataPipeline)

# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

df <- data.frame(a = 1:2, b = 3:4)
rownames(df) <- 1:2

write_array(array = as.matrix(df),
            handle = handle,
            data_product = "test/array",
            component = "component1/a/s/d/f/s",
            description = "Some description",
            dimension_names = list(rowvalue = rownames(df),
                                   colvalue = colnames(df)),
            dimension_values = list(NA, 10),
            dimension_units = list(NA, "km"),
            units = "s")

finalise(handle)

Read data product (HDF5) #

User written config.yaml #

run_metadata:
  description: Read an array
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |- 
    R -f simple_working_examples/read_array.R ${{CONFIG_DIR}}

read:
- data_product: test/array

Working config.yaml #

fdp run should create a working config.yaml file, which is read by the Data Pipeline API. In this example, the working config.yaml file is pretty much identical to the original config.yaml file, only ${{CONFIG_DIR}} is replaced by the directory in which the working config.yaml file resides.

run_metadata:
  description: Read an array
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |- 
    R -f simple_working_examples/read_array.R /Users/SoniaM/datastore/coderun/20210511-231444/
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation

read:
- data_product: test/array
  use: 
    version: 0.1.0

Submission script (R) #

library(rDataPipeline)

# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

data_product <- "test/array"
component <- "component1/a/s/d/f/s"

dat <- read_array(handle = handle,
                  data_product = data_product,
                  component = component)

finalise(handle)

Write data product (csv) #

User written config.yaml #

run_metadata:
  description: Write csv file
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |-
        R -f simple_working_examples/write_csv.R ${{CONFIG_DIR}}

write:
- data_product: test/csv
  description: test csv file with simple data
  file_type: csv

Working config.yaml #

run_metadata:
  description: Write csv file
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/ 
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |-
        R -f simple_working_examples/write_csv.R /Users/SoniaM/datastore/coderun/20210511-231444/
  public: true
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation

write:
- data_product: test/csv
  description: test csv file with simple data
  file_type: csv
  use:    
    version: 0.0.1

Submission script (R) #

library(rDataPipeline)

# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

df <- data.frame(a = 1:2, b = 3:4)
rownames(df) <- 1:2

path <- link_write(handle, "test/csv")

write.csv(df, path)

finalise(handle)

Read data product (csv) #

User written config.yaml #

run_metadata:
  description: Read csv file
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |-
        R -f simple_working_examples/read_csv.R ${{CONFIG_DIR}}

read:
- data_product: test/csv

Working config.yaml #

run_metadata:
  description: Read csv file
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
  script: |-
        R -f simple_working_examples/read_csv.R /Users/SoniaM/datastore/coderun/20210511-231444/
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation

read:
- data_product: test/csv
  use:
    version: 0.0.1

Submission script (R) #

library(rDataPipeline)

# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

path <- link_read(handle, "test/csv")
df <- read.csv(path)

finalise(handle)

Write data product (point estimate) #

User written config.yaml #

run_metadata:
  description: Write point estimate
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
  script: |-
        R -f simple_working_examples/write_point_estimate.R ${{CONFIG_DIR}}

write:
- data_product: test/estimate/asymptomatic-period
  description: asymptomatic period

Working config.yaml #

run_metadata:
  description: Write point estimate
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
  script: |-
        R -f simple_working_examples/write_point_estimate.R /Users/SoniaM/datastore/coderun/20210511-231444/
  public: true
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation

write:
- data_product: test/estimate/asymptomatic-period
  description: asymptomatic period
  use:    
    version: 0.0.1

Submission script (R) #

library(rDataPipeline)

# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

write_estimate(value = 9,
               handle = handle,
               data_product = "test/distribution/asymptomatic-period",
               component = "asymptomatic-period",
               description = "asymptomatic period")

finalise(handle)

Read data product (point estimate) #

User written config.yaml #

run_metadata:
  description: Read point estimate
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
  script: |-
        R -f simple_working_examples/read_point_estimate.R ${{CONFIG_DIR}}

read:
- data_product: test/estimate/asymptomatic-period

Working config.yaml #

run_metadata:
  description: Read point estimate
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
  script: |-
        R -f simple_working_examples/read_point_estimate.R /Users/SoniaM/datastore/coderun/20210511-231444/
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation

read:
- data_product: test/estimate/asymptomatic-period
  use:
    version: 0.0.1

Submission script (R) #

library(rDataPipeline)

# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

read_estimate(handle = handle,
              data_product = "test/distribution/asymptomatic-period",
              component = "asymptomatic-period")

finalise(handle)

Write data product (distribution) #

User written config.yaml #

run_metadata:
  description: Write distribution
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
  script: |-
        R -f simple_working_examples/write_distribution.R ${{CONFIG_DIR}}

write:
- data_product: test/distribution/symptom-delay
  description: Estimate of symptom delay

Working config.yaml #

run_metadata:
  description: Write distribution
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
  script: |-
        R -f simple_working_examples/write_distribution.R /Users/SoniaM/datastore/coderun/20210511-231444/
  public: true
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation

write:
- data_product: test/distribution/symptom-delay
  description: Estimate of symptom delay
  use:    
    version: 0.0.1

Submission script (R) #

library(rDataPipeline)

# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

write_distribution(handle = handle,
                   data_product = "test/distribution/symptom-delay",
                   component = "symptom-delay",
                   distribution = "Gaussian",
                   parameters = list(mean = -16.08, SD = 30),
                   description = "symptom delay")

finalise(handle)

Read data product (distribution) #

User written config.yaml #

run_metadata:
  description: Read distribution
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
  script: |-
        R -f simple_working_examples/read_distribution.R ${{CONFIG_DIR}}

read:
- data_product: test/distribution/symptom-delay

Working config.yaml #

run_metadata:
  description: Read distribution
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
  script: |-
        R -f simple_working_examples/read_distribution.R /Users/SoniaM/datastore/coderun/20210511-231444/
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/FAIRDataPipeline/FDP_validation

read:
- data_product: test/distribution/symptom-delay
  use:
    version: 0.0.1

Submission script (R) #

library(rDataPipeline)

# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

read_distribution(handle = handle,
                  data_product = "test/distribution/symptom-delay",
                  component = "symptom-delay")

finalise(handle)

Attach issue to component #

User written config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}
write:
- data_product: test/array/issues/component
  description: a test array

Working config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
  public: true
  latest_commit: e3c0ebdf5ae079bd72f601ec5eefdf998c4fc8ec
  remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: test/array/issues/component
  description: a test array
  use:
    version: 0.1.0

Submission script (R) #

In R, we can attach issues to components in different ways. If there’s a more elegant way to do this, please tell me!

Attach an issue on the fly by referencing an index in the handle:

library(rDataPipeline)

# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

df <- data.frame(a = 1:2, b = 3:4)
rownames(df) <- 1:2

component_id <- write_array(array = as.matrix(df),
                            handle = handle,
                            data_product = "test/array/issues/component",
                            component = "component1/a/s/d/f/s",
                            description = "Some description",
                            dimension_names = list(rowvalue = rownames(df),
                                                   colvalue = colnames(df)),
                            dimension_values = list(NA, 10),
                            dimension_units = list(NA, "km"),
                            units = "s")

issue <- "some issue"
severity <- 7

raise_issue(index = component_id,
            handle = handle,
            issue = issue,
            severity = severity)

finalise(handle)

Attaching an issue to a data product that already exists in the data registry by referencing it explicitly:

library(rDataPipeline)

# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

issue <- "some issue"
severity <- 7

raise_issue(handle = handle,
            data_product = "test/array/issues/component",
            component = "component1/a/s/d/f/s",
            version = "0.1.0",
            namespace = "username",
            issue = issue,
            severity = severity)

finalise(handle)

Attaching an issue to multiple components at the same time:

raise_issue(index = c(component_id1, component_id2),
            handle = handle,
            issue = issue,
            severity = severity)

or

raise_issue(handle = handle,
            data_product = "test/array/issues/component",
            component = c("component1/a/s/d/f/s", "component2/a/s/d/f/s"),
            version = "0.1.0",
            namespace = "username",
            issue = issue,
            severity = severity)

Attach issue to whole data product #

User written config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}
write:
- data_product: "test/array/issues/whole"
  description: a test array
  file_type: csv

Working config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
  public: true
  latest_commit: 40725b40252fd55ba355f7ed66f5a42387f1674f
  remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: test/array/issues/whole
  description: a test array
  file_type: csv
  use:
    version: 0.1.0

Submission script (R) #

In R, we can attach issues to data products in different ways. If there’s a more elegant way to do this, please tell me!

Attach an issue on the fly by referencing an index in the handle:

library(rDataPipeline)

# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

df <- data.frame(a = 1:2, b = 3:4)
rownames(df) <- 1:2

index <- write_array(array = as.matrix(df),
                     handle = handle,
                     data_product = "test/array/issues/whole",
                     component = "component1/a/s/d/f/s",
                     description = "Some description",
                     dimension_names = list(rowvalue = rownames(df),
                                            colvalue = colnames(df)))

write_array(array = as.matrix(df),
            handle = handle,
            data_product = "test/array/issues/whole",
            component = "component2/a/s/d/f/s",
            description = "Some description",
            dimension_names = list(rowvalue = rownames(df),
                                   colvalue = colnames(df)))

issue <- "some issue"
severity <- 7

raise_issue(index = index,
            handle = handle,
            issue = issue,
            severity = severity,
            whole_object = TRUE)

finalise(handle)

Attaching an issue to a data product that already exists in the data registry by referencing it explicitly:

library(rDataPipeline)

# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

issue <- "some issue"
severity <- 7

raise_issue(handle = handle,
            data_product = "test/array/issues/whole",
            version = "0.1.0",
            namespace = "username",
            issue = issue,
            severity = severity)

finalise(handle)

Attaching an issue to multiple data products at the same time:

raise_issue(index = c(index1, index2),
            handle = handle,
            issue = issue,
            severity = severity,
            whole_object = TRUE)

or

raise_issue(handle = handle,
            data_product = c("test/array/issues/whole", "test/array/issues/whole/2"),
            version = c("0.1.0", "0.1.0"),
            namespace = "username",
            issue = issue,
            severity = severity)

Attach issue to config #

User written config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}

Working config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
  latest_commit: 0d98e732b77e62a6cd390c6aec655f260f5f9b33
  remote_repo: https://github.com/fake_org/fake_repo
read: []
write: []

Submission script (R) #

library(rDataPipeline)

# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

config_issue <- "issue with config"
config_severity <- 7

raise_issue_config(handle = handle,
                   issue = config_issue,
                   severity = config_severity)

finalise(handle)

Attach issue to submission script #

User written config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}

Working config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
  latest_commit: 358f64c4044f3b3f761865ee8e9f4375cf41d155
  remote_repo: https://github.com/fake_org/fake_repo
read: []
write: []

Submission script (R) #

library(rDataPipeline)

# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

script_issue <- "issue with script"
script_severity <- 7

raise_issue_script(handle = handle,
                   issue = script_issue,
                   severity = script_severity)

finalise(handle)

Attach issue to GitHub repository #

User written config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}

Working config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
  latest_commit: 6b23ec822bfd7ea5f419c70ce18fb73b59c90754
  remote_repo: https://github.com/fake_org/fake_repo
read: []
write: []

Submission script (R) #

library(rDataPipeline)

# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)

repo_issue <- "issue with repo"
repo_severity <- 7

raise_issue_repo(handle = handle,
                 issue = repo_issue,
                 severity = repo_severity)

finalise(handle)

Attach issue to external object #

This is not something we want to do.

Attach issue to code run #

This might be something we want to do in the future, but not now.

Delete DataProduct (optionally) if identical to previous version #

Delete CodeRun (optionally) if nothing happened #

That is, if no output was created and no issue was raised

CodeRun with aliases (use block example) #

User written config.yaml #

run_metadata:
  description: A test model
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: SCRC
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata
  script: |- 
    R -f inst/SCRC/scotgov_management/submission_script.R ${{CONFIG_DIR}}

read:
- data_product: test/data/alias
  use:
    namespace: johnsmith
    data_product: scotland/human/population

write:
- data_product: human/outbreak-timeseries
  description: data product description
  use:
    data_product: scotland/human/outbreak-timeseries
- data_product: human/outbreak/simulation_run
  description: another data product description
  use:
    data_product: human/outbreak/simulation_run-${{RUN_ID}}

Working config.yaml #

fair run should create a working config.yaml file, which is read by the Data Pipeline API. In this example, the working config.yaml file is pretty much identical to the original config.yaml file, only ${{CONFIG_DIR}} is replaced by the directory in which the working config.yaml file resides.

run_metadata:
  description: A test model
  local_data_registry_url: https://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: soniamitchell
  default_output_namespace: soniamitchell
  write_data_store: /Users/SoniaM/datastore/
  public: true
  local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata
  latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
  remote_repo: https://github.com/ScottishCovidResponse/SCRCdata
  script: |- 
    R -f inst/SCRC/scotgov_management/submission_script.R /Users/SoniaM/datastore/coderun/20210511-231444/

read:
- data_product: human/population
  use:
    data_product: scotland/human/population
    version: 0.1.0
    namespace: johnsmith

write:
- data_product: human/outbreak-timeseries
  description: data product description
  use:
    data_product: scotland/human/outbreak-timeseries
    version: 0.1.0
- data_product: human/outbreak/simulation_run
  description: another data product description
  use:
    data_product: human/outbreak/simulation_run-${{RUN_ID}}    
    version: 0.1.0

CodeRun with read globbing #

This example makes use of globbing in the read: block.

First we need to populate your local registry with something to read:

User written config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/input_globbing.R ${{CONFIG_DIR}}
write:
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/1
  description: A csv file
  file_type: csv
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/thing/1
  description: A csv file
  file_type: csv

Working config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/input_globbing.R /Users/SoniaM/datastore/coderun/20210511-231444/
  public: yes
  latest_commit: 064e900b691e80058357a344f02cf73de0166fab
  remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/1
  description: A csv file
  file_type: csv
  use:
    version: 0.0.1
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/thing/1
  description: A csv file
  file_type: csv
  use:
    version: 0.0.1

Now that our local registry is populated, we can try globbing:

User written config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/input_globbing.R ${{CONFIG_DIR}}
read:
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/*

Working config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/input_globbing.R /Users/SoniaM/datastore/coderun/20210511-231444/
  latest_commit: b9e2187b3796f06ca33f92c3a82863215917ed0e
  remote_repo: https://github.com/fake_org/fake_repo
read:
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/thing/1
  use:
    version: 0.0.1
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/1
  use:
    version: 0.0.1
write: []

CodeRun with write globbing #

This example makes use of globbing in the write: block.

First we need to populate your local registry with some data:

User written config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/output_globbing.R ${{CONFIG_DIR}}
write:
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/1
  description: A csv file
  file_type: csv
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/thing/1
  description: A csv file
  file_type: csv

Working config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/output_globbing.R /Users/SoniaM/datastore/coderun/20210511-231444/
  public: yes
  latest_commit: 2a8688677321b99e3a2545ce020992d136334b71
  remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/1
  description: A csv file
  file_type: csv
  use:
    version: 0.0.1
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/thing/1
  description: A csv file
  file_type: csv
  use:
    version: 0.0.1

Now that our local registry is populated, we can try globbing:

User written config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/output_globbing.R ${{CONFIG_DIR}}
write:
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/*
  description: A csv file
  file_type: csv
  use:
    version: ${{MAJOR}}

Working config.yaml #

run_metadata:
  description: Register a file in the pipeline
  local_data_registry_url: http://localhost:8000/api/
  remote_data_registry_url: https://data.scrc.uk/api/
  default_input_namespace: username
  default_output_namespace: username
  write_data_store: /Users/username/datastore/
  local_repo: local_repo
  script: |-
        R -f simple_working_examples/output_globbing.R /Users/SoniaM/datastore/coderun/20210511-231444/
  public: yes
  latest_commit: f95815976cd4d93c062f94a48525fcec88b6ef34
  remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/*
  description: A csv file
  file_type: csv
  use:
    version: 1.0.0
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/thing/1
  description: A csv file
  file_type: csv
  use:
    version: 1.0.0
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/1
  description: A csv file
  file_type: csv
  use:
    version: 1.0.0
```