fair CLI functions
#
Note that this is a living document and the following is subject to change.
A simple example of how the data pipline should run from the command line:
fair pull config.yaml
fair run config.yaml
fair push config.yaml
fair pull
#
- download any data required by
read:from the remote data store and record metadata in the data registry (whilst editing relevant entries, e.g.storage_root) - pull meta data associated with all previous versions of these objects listed in
write:from the remote data registry - download any data listed in
register:from the original source and record metadata in the data registry
fair run
#
- read (and validate) the config.yaml file
- generate a working config.yaml file (see
Working example)
-
globbing is used to interpret
*as all matching objects as well as the original string returned, e.g. ifreal/data/1version 0.0.1 andreal/data/thing/1version 0.0.1 already exist in the registry, the user-written config:write: - data_product: real/data/* description: general description for all data products use: namespace: someone version: ${{MINOR}}should return:
write: - data_product: real/data/1 use: data_product: real/data/1 description: general description for all data products version: 0.1.0 namespace: someone - data_product: real/data/thing/1 use: data_product: real/data/thing/1 description: general description for all data products version: 0.1.0 namespace: someone - data_product: real/data/* use: data_product: real/data/* description: general description for all data products version: 0.0.1 namespace: someone -
specific version numbers and any variables in
run_metadata:,register:,read:, andwrite:are replaced with true values, e.g.${{CONFIG_DIR}}is replaced by the directory within which the working config.yaml file residesrelease_date: ${{DATETIME}}is replaced byrelease_date: 2021-04-14T11:34:37which is a valid form for the registry.version: 0.${{DATE}}.0is replaced byversion: 0.20210414.0version: ${{PATCH}}should increment version by patch; andversion: 0.${{DATETIME-%Y%m%d}}.0or any variants thereof are replaced by an appropriately formatted string.
-
if no version is given, then one should be written such that patch is incremented if the data product already exists, otherwise version should be set to 0.0.1.
-
register:is removed andexternal_objects are written toread:asdata_products -
populate
public:field inrun_metadata:section (default istrue) -
populate
version:field inuse:section of whether the user-written config contained the field or not
-
local_repo:must always be given in the config.yaml file- ensure the repo is clean
- get the hash of the latest commit and add to the working config.yaml file in
run_metadata: latest_commit: - if
run_metadata: remote_repo:isfalse, thenfair pushshould copy the repo to the file store - if
run_metadata: remote_repo:is absent or doesn’t contain a URL, thenfair runshould try to get the remote repo url from the local repo - note that there are exceptions and the user may reference a script located outside of a repository
- save the working config.yaml file in the local data store, in <local_store>/coderun/<date>T<time>/config.yaml, e.g. datastore/coderun/20210625T165552/config.yaml
- save the submission script to the local data store in <local_store>/coderun/<date>T<time>/script.sh
- note that config.yaml should contain either
script:that should be saved as the submission script, orscript_path:that points to the file that should be saved as the submission script
- note that config.yaml should contain either
- save the path to <local_store>/coderun/<date>T<time>/ in the global environment as
$FDP_CONFIG_DIRso that it can be picked up by the script that is run after this has been completed - execute the submission script
fair push
#
- push new files (generated from
write:andregister:) to the remote data store - record metadata in the data registry (whilst editing relevant entries, e.g.
storage_root)
