orderly 1.0.1 released to CRAN

Oct 18, 2019

Rich FitzJohn

3 minute read

Tags:

We are pleased to announce the first public release of orderly, our reproducible reporting framework, implemented as an R package and now available on CRAN.

The orderly package was designed to help with a common pattern in reporting, where a piece of analysis might be run multiple times (over say a number of weeks or months) and where the inputs to the analysis might change. Any piece of analysis has numerous inputs (data & metadata, source code & packages) and changes in any of these might change the results of the analysis.

In our case at VIMC this is analysis of impact of vaccines, but this pattern exists in many fields. When reports are run multiple times they will inevitably vary, but we want to understand why they vary - was it a change in the analysis code, in the input data, or in some other dependency of the report? Critically, we want it to always be completely clear that a given set of inputs belongs to a given set of outputs, without relying on any discipline from the end-user.

The principle idea in orderly is that if the user lists the required inputs and expected outputs of an analysis, then we can automate many tasks. The user must write a small configuration file¹ like

script: script.R

resources:
  - data.csv

artefacts:
  - staticgraph:
      description: A graph of things
      filenames: mygraph.png

indicating the script to be run, any additional files needed to run the script, and the files that will be produced by running the script. After that, orderly imposes no strong restrictions on what goes into script.R. As such it is designed to accept R analyses that might have started life as standalone analyses as much as ones that were developed specifically for use within orderly.

The basic idea is described in more detail on the package web page and in the introductory vignette.

We have borrowed ideas from version control of source files to create a system where multiple versions of analyses can be compared side-by-side and where outputs of an analysis are always stored alongside their inputs. We have used orderly in two large collaborative projects since 2017 and continue to actively improve the package.

Core features include:

ability to use data from SQL databases from reports
manage reports that depend on previously run reports (perhaps multiple or specific versions)
completely agnostic as to the sort of analyses that are run within a report, requiring no changes to most source code
all inputs and outputs are automatically hashed and (along with information on all loaded R packages and the current session) stored alongside the outputs and in a database
a simple directory layout that is designed to minimise git conflicts and streamline collaboration
a web front-end, OrderlyWeb, which can be used to create a user-friendly interface to the system and support a centralised workflow (see the “remote” vignette)

We take backward compatibility very seriously and have developed a system for safely migrating any changes to the internal formats used, including running these migrations against reference data during automated testing.

Install orderly from CRAN with

install.packages("orderly")

Currently this file is in yaml format because that suits our workflows, but it would be straightforward to replace this with either a special R function or even generate the configuration automatically (and transparently) for stereotyped uses like compiling markdown files. ↩︎

Reside-IC