Website - Youtube - About - Talks - Books - Packages - RSS

Multi-language pipelines with rixpress

R
nix
Published

May 13, 2025

If you want to watch a 2-Minute video introduction to {rixpress}, click the image below:

Video Thumbnail

In August last year I tried to see how one could use Nix as a built automation tool for data science pipelines, and in March this year, I’ve started working on an R package that would make setting up such pipelines easy, which I already discussed in my previous post.

After some weeks of work, I think that {rixpress} is at stage where it can already be quite useful to a lot of people. {rixpress} helps you set up your projects as a pipeline of completely reproducible steps. {rixpress} is a sister package to {rix} and together they make true computational reproducibility easier to achieve. {rix} makes it easy to capture and rebuild the exact computational environment in which the code was executed, and {rixpress} helps you move away from script-based workflows that can be difficult to execute and may require manual intervention.

When I first introduced {rixpress}, it was essentially a proof of concept. It could manage some basic R and Python interplay, but it was clearly in its early stages. I’ve since then added some features that I think really show why using Nix as the underlying build engine is a good idea.

Just like for its sister package {rix}, I’ve taken the step to submit {rixpress} for peer review by rOpenSci. {rix} really benefitted from rOpenSci’s peer review and I believe that it’ll be the same for {rixpress}.

Current Capabilities of {rixpress}

Here are the features currently available in {rixpress}:

  • A key motivation was to simplify building pipelines where different steps might require different language environments. With {rixpress}, this is a central feature:

  • Define steps in R (rxp_r(), rxp_r_file()) or Python (rxp_py(), rxp_py_file()).

  • Importantly, each step can be configured to run in its own Nix-defined environment (for example, use nix_env = "my-python-env.nix" for a Python step, or nix_env = "my-r-env.nix" for an R step). These environments can be generated using my other package, {rix}.

  • Pass data between R and Python steps. {rixpress} manages the serialization, using reticulate by default for R/Python object conversion, and also allows custom functions for other formats like JSON or model-specific files.

  • Build Quarto (or R Markdown) documents using rxp_quarto() (and rxp_rmd()). These documents can access any artifact (rxp_read("my_artifact")) from preceding steps, regardless of the language used to generate it. Quarto rendering can also occur within its own dedicated Nix environment.

  • Every step in a {rixpress} pipeline is treated as a Nix derivation. This means hermetic builds, sandboxed execution, and content-addressable caching, leading to a high degree of reproducibility (as expected with Nix).

  • As pipelines grow, visualization is helpful. rxp_ggdag() (using {ggdag}) and rxp_visnetwork() (using {visNetwork}) provide a visual overview of dependencies. dag_for_ci() exports the DAG as an {igraph} dot file format, which can then be used for text-based visualisation on CI.

  • For CI, rxp_ga() can generate a GitHub Actions workflow to run the pipeline on each push. This workflow includes caching of Nix store paths between runs (using export_nix_archive() and import_nix_archive()) to avoid unnecessary rebuilds.

  • There is ample documentation, and even a vignette detailling how to use {cmdstanr} within a {rixpress} pipeline. {cmdstanr} works in a specific way, by compiling Stan models to C++, and so this requires careful management of Stan model compilation and sampling within the Nix sandbox, demonstrating that complex tools can be integrated.

  • It is possible to retrieve outputs from previous pipeline executions. {rixpress} maintains timestamped build logs. Functions like rxp_list_logs(), rxp_inspect(which_log = "..."), and rxp_read("derivation_name", which_log = "...") allow you to access the history of your pipeline’s execution and retrieve specific artifacts.

An Invitation for Feedback

Considerable effort has gone into making {rixpress} robust and useful. A collection of examples is available at the rixpress_demos GitHub repository to illustrate various use cases (R-only, Python-only, R/Python, Quarto, {cmdstanr}, and an XGBoost example).

I’m now looking for feedback from users: * I encourage you to try it out. I recommend watching this tutorial video to get started quickly. * Install it, explore the examples, and perhaps apply it to one of your projects. * Any observations on what works well, what might be confusing, or any issues encountered would be helpful. * Your feedback would be very valuable. Please feel free to open an issue on the {rixpress} GitHub repository with bug reports, feature suggestions, or questions.

Why use {rixpress} instead of {targets}?

{targets} is a fantastic package, and the main source of inspiration of {rixpress}. If you have no need for multilanguage pipelines, then running {targets} inside of a Nix environment, as described here is perfectly valid. But I think that {rixpress} has its place if:

  • you need to use multiple languages, as you don’t need adapt Python code to work with {reticulate},
  • you’re already convinced by Nix and use {rix},
  • want to use a simple pipeline-tool, with a smaller scope.