Welcome to my blog where I talk about R, Nix, Econometrics and Data Science. If you enjoy reading what I write, you might enjoy my books or want to follow me on Mastodon or Twitter or Bluesky. If you are 40+, click here instead. I also make videos on youtube.
2025
- 
Orchestrating Polyglot, Reproducible Data Science with Nix and {rixpress}
- Python needs its CRAN
- You can outsource the grunt work to an LLM, not expertise
- ggplot2 4.0.0 is coming and why ultimately it’s on YOU to ensure your environments are reproducible
- Multi-language pipelines with rixpress
- Announcing rixpress
- Why we forked nixpkgs
- Using options() to inject a function’s internal variable for reproducible testing
- New year, new blog
2024
- Reproducible data science with Nix, part 13 – {rix} is on CRAN!
- Reproducible data science with Nix, part 12 – Nix as a polyglot build automation tool for data science
- Reproducible data science with Nix, part 11 – build and cache binaries with Github Actions and Cachix
- Reproducible data science with Nix, part 10 – contributing to nixpkgs
- Reproducible data science with Nix, part 9 – rix is looking for testers!
2023
- Reproducible data science with Nix, part 8 – nixpkgs, a tale of the magic of free and open source software and a call for charity
- Reproducible data science with Nix, part 7 – Building a Quarto book using Nix on Github Actions
- An overview of what’s out there for reproducibility with R
- ZSA Voyager review
- Reproducible data science with Nix, part 6 – CI/CD has never been easier
- Reproducible data science with Nix, part 5 – Reproducible literate programming with Nix and Quarto
- Reproducible data science with Nix, part 4 – So long, {renv} and Docker, and thanks for all the fish
- Reproducible data science with Nix, part 3 – frictionless {plumber} api deployments with Nix
- Reproducible data science with Nix, part 2 – running {targets} pipelines with Nix
- Reproducible data science with Nix, part 1 – what is Nix
- How to self-publish a technical book on Leanpub and Amazon using Quarto
- Why you should consider working on a dockerized development environment
- I’ve been blogging for 10 years
- Automating checks of handcrafted Word tables with {docxtractr}
- Software engineering techniques that non-programmers who write a lot of code can benefit from — the DRY WIT approach
- What I’ve learned making an .epub Ebook with Quarto
- MRAN is getting shutdown - what else is there for reproducibility with R, or why reproducibility is on a continuum?
2022
- Code longevity of the R programming language
- Functional programming explains why containerization is needed for reproducibility
- Reproducibility with Docker and Github Actions for the average R enjoyer
- Open source is a hard requirement for reproducibility
- How to deal with annoying medium sized data inside a Shiny app
- A Linux Live USB as a statistical programming dev environment
- R, its license and my take on it
- Why and how to use JS in your Shiny app
- What’s the fastest way to search and replace strings in a data frame?
- R will always be arcane to those who do not make a serious effort to learn it…
- Some learnings from functional programming you can use to write safer programs
- Get packages that introduce unique syntax adopted less?
- chronicler is now available on CRAN
- Self-documenting {ggplot}s thanks to the power of monads!
- Why you should(n’t) care about Monads if you’re an R programmer
- The {chronicler} package, an implementation of the logger monad in R
- Capture errors, warnings and messages
- 
Add logging to your functions using my newest package {loud}
2021
- How to write code that returns (Rmarkdown) code
- Speedrunning row-oriented workflows
- The quest for fast(er?) row-oriented workflows
- Is it worth the weight?
- Building your own knitr compile farm on your Raspberry Pi with {plumber}
- Dealing with non-representative samples with post-stratification
- The link between keyboard layouts and typing speed - Data collection phase
- How to treat as many files as fit on your hard disk without loops (sorta) nor running out of memory all the while being as lazy as possible
- Using explainability methods to understand (some part) of the spread of COVID-19 in a landlocked country
- Server(shiny)-less dashboards with R, {htmlwidgets} and {crosstalk}
- R makes it too easy to write papers
- How to draw a map of arbitrary contiguous regions, or visualizing the spread of COVID-19 in the Greater Region
2020
- A year in review
- (Half) Lies, (half) truths and (half) statistics
- Poorman’s automated translation with R and Google Sheets using {googlesheets4}
- Graphical User Interfaces were a mistake but you can still make things right
- It’s time to retire the “data scientist” label
- Building apps with {shinipsum} and {golem}
- The Raspberry Pi 4B as a shiny server
- Gotta go fast with “{tidytable}”
- Exploring NACE codes
- No excuse not to be a Bayesian anymore
- How to basic: bar plots
- What would a keyboard optimised for Luxembourguish look like?
- Explainbility of {tidymodels} models with {iml}
- Machine learning with {tidymodels}
- Synthetic micro-datasets: a promising middle ground between data privacy and data analysis
- Dynamic discrete choice models, reinforcement learning and Harold, part 2
- Dynamic discrete choice models, reinforcement learning and Harold, part 1
2019
- Intrumental variable regression and machine learning
- Multiple data imputation and explainability
- Cluster multiple time series using K-means
- Split-apply-combine for Maximum Likelihood Estimation of a linear model
- {disk.frame} is epic
- Modern R with the tidyverse is available on Leanpub
- Using linear models with binary dependent variables, a simulation study
- Statistical matching, or when one single data source is not enough
- Curly-Curly, the successor of Bang-Bang
- Intermittent demand, Croston and Die Hard
- Using cosine similarity to find matching documents: a tutorial using Seneca’s letters to his friend Lucilius
- The never-ending editor war (?)
- For posterity: install {xml2} on GNU/Linux distros
- Fast food, causality and R packages, part 2
- Fast food, causality and R packages, part 1
- Historical newspaper scraping with {tesseract} and R
- Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}
- 
Pivoting data frames just got easier thanks to pivot_wide()andpivot_long()
- Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2
- Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 1
- Manipulating strings with the {stringr} package
- Building a shiny app to explore historical newspapers: a step-by-step guide
- Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century
- Making sense of the METS and ALTO XML standards
- Looking into 19th century ads from a Luxembourguish newspaper with R
2018
- R or Python? Why not both? Using Anaconda Python within R with {reticulate}
- Some fun with {gganimate}
- Objects types and some useful R functions for beginners
- Using the tidyverse for more than data manipulation: estimating pi with Monte Carlo methods
- Manipulate dates easily with {lubridate}
- What hyper-parameters are, and what to do with them; an illustration with ridge regression
- A tutorial on tidy cross-validation with R
- The best way to visit Luxembourguish castles is doing data science + combinatorial optimization
- Using a genetic algorithm for the hyperparameter optimization of a SARIMA model
- Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach
- Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport
- Analyzing NetHack data, part 2: What players kill the most
- Analyzing NetHack data, part 1: What kills the players
- From webscraping data to releasing it as an R package to share with the world: a full tutorial with data from NetHack
- Maps with pie charts on top of each administrative division: an example with Luxembourg’s elections data
- Getting the data from the Luxembourguish elections out of Excel
- Exporting editable plots from R to Powerpoint: making ggplot2 purrr with officer
- How Luxembourguish residents spend their time: a small {flexdashboard} demo using the Time use survey data
- Going from a human readable Excel file to a machine-readable csv with {tidyxl}
- The year of the GNU+Linux desktop is upon us: using user ratings of Steam Play compatibility to play around with regex and the tidyverse
- Dealing with heteroskedasticity; regression with robust standard errors using R
- Missing data imputation and instrumental variables regression: the tidy approach
- Forecasting my weight with R
- Getting data from pdfs using the pdftools package
- {pmice}, an experimental package for missing data imputation in parallel using {mice} and {furrr}
- Imputing missing values in parallel using {furrr}
- Get basic summary statistics for all the variables in a data frame
- Keep trying that api call with purrr::possibly()
- Getting {sparklyr}, {h2o}, {rsparkling} to work together and some fun with bash
- Importing 30GB of data into R with sparklyr
- Predicting job search by training a random forest on an unbalanced dataset
- Mapping a list of functions to a list of datasets with a list of columns as arguments
- It’s lists all the way down, part 2: We need to go deeper
- It’s lists all the way down
2017
- Building formulae
- Teaching the tidyverse to beginners
- Peace of mind with purrr
- Easy peasy STATA-like marginal effects with R
- Why I find tidyeval useful
- tidyr::spread() and dplyr::rename_at() in action
- Lesser known dplyr 0.7* tricks
- Make ggplot2 purrr
- Introducing brotools
- Lesser known purrr tricks
- Lesser known dplyr tricks
- How to use jailbreakr
- My free book has a cover!
2016
- Functional programming and unit testing for data munging with R available on Leanpub
- Work on lists of datasets instead of individual datasets by using functional programming
- I’ve started writing a ‘book’: Functional programming and unit testing for data munging with R
- Merge a list of datasets together
- Read a lot of datasets at once with R
- Data frame columns as arguments to dplyr functions
- Careful with tryCatch
- Unit testing with R
2015
- Bootstrapping standard errors for difference-in-differences estimation with R
- Update to Introduction to programming econometrics with R
- Export R output to a file
- Introduction to programming econometrics with R
2014
- R, R with Atlas, R with OpenBLAS and Revolution R Open: which is fastest?
- Object Oriented Programming with R: An example with a Cournot duopoly
2013
- Using R as a Computer Algebra System with Ryacas
- Nonlinear Gmm with R - Example with a logistic regression
- Method of Simulated Moments with R
- Simulated Maximum Likelihood with R
No matching items