R makes it too easy to write papers
RI’m currently working on a preprint on the spread of COVID-19 in Luxembourg. My hypothesis is that landlocked countries, especially ones like Luxembourg that have very close ties to their neighbours have a very hard time controlling the pandemic, unlike island countries which can completely close off their borders, impose very drastic quarantine measure to anyone who would still have to come in and successfully wipe out the disease by imposing strict lockdowns and contract tracing measures.
In actuality, this started more as a project in which I simply wanted to look at COVID-19 cases for Luxembourg and its neighbouring regions. As I started digging and writing code, this evolved into this package which makes it easy to download open data on the daily COVID-19 cases from Luxembourg and its neighbours. I also blogged about it here. Creating and animating the map that you see in that blog post, I thought about this hypothesis I wanted to test. Maybe it won’t work (preliminary results are encouraging however), but I also took this opportunity to write a preprint using only R, Rmarkdown and packages that make writing something like that easy. This blog post is a shallow review of these tools.
By the way, you can take a look at the repo with the preprint here, and I’ll be writing about it soon as well.
Packages as by-products of papers
The first thing I did was download data from the various open data portals, make sense of it and
then plot it. At first, I did so in a very big a messy script file. As time went on, I felt more
and more disgusted with this script and wanted to make something cleaner out of it. This is how the
package I already mentioned above came to be. It took some time to prepare, but now it simplifies
the process of updating my plots and machine learning models much faster. It also makes the
paper more “interesting”; not everyone is interesting in the paper itself, but might be interested
in the data, or in the process of making the package itself. I think that there are many examples
of such packages as by-products of papers, especially papers that present and discuss new
methods are very often accompanied by a package to make it easy for readers of the paper to use
this new method.
Package development is made easy with {usethis}
.
Starting a draft with {rticles}
The second thing I did was start a draft with {rticles}
. This package allows users to start a
Rmarkdown draft with a single command. Users can choose among many different drafts for many
different journals; I choose the arXiv draft, as I might publish the preprint there. To do so,
I used the following command:
rmarkdown::draft("paper.Rmd", template = "arxiv", package = "rticles")
I can now edit this Rmd
file and compile it to a nice looking pdf very easily. But I don’t do
so in the “traditional” way of knitting the Rmd
file from Rstudio (or rather, from Spacemacs,
my editor of choice). No, no, for this I use the magnificent {targets}
package.
Setting up a clean, automated and reproducible workflow with {targets}
{targets}
is the latest package by William Landau, who is also the author of {drake}
. I was
very impressed by {drake}
and even made a video about it
but now {targets}
will replace {drake}
as THE build automation tool for the R programming language.
I started using it for this project, and just like {drake}
it’s really an amazing package.
It allows you to declare your project as a series of steps, each one of them being a call to a function.
It’s very neat, and clean. The dependencies between each of the steps and objects that are created
at each step are tracked by {targets}
and should one of them get updated (for instance, because
you changed the code of the underlying function), every object that depends on it will also get
updated once you run the pipeline again.
This can get complex very quickly, and here is the network of objects, functions and their dependencies for the preprint I’m writing:
Imagine keeping track of all this in your head. Now I won’t go much into how to use {targets}
,
because the user manual is very detailed. Also, you can
inspect the repository of my preprint I linked above to figure out the basics of {targets}
.
What’s really neat though, is that the Rmd
file of your paper is also a target that gets built
automatically. If you check out my repository, you will see that it’s the last target that is built.
And if you check the Rmd
file itself, you will see the only R code I use is:
tar_load(something)
tar_load()
is a {targets}
function that loads an object, in the example above this object
is called something
and puts it in the paper. For instance, if something
is a ggplot object,
then this plot will appear on that spot in the paper. It’s really great, because the paper
itself gets compiled very quickly once all the targets are built.
Machine learning, and everything else
Last year I wrote a blog post about {tidymodels}
, which you can find here.
Since then, the package evolved, and it’s in my opinion definitely one of the best machine learning
packages out there. Just like the other tools I discussed in this blog post, it abstracts away
many unimportant idiosyncrasies of many other packages and ways of doing things, and let’s you
focus on what matters; getting results and presenting them neatly.
I think that this is what I really like about the R programming language, and the ecosystem of
packages built on top of it. Combining functional programming, build automation tools, markdown,
and all the helper packages like {usethis}
make it really easy to go from idea, to paper, or
interactive app using {shiny}
very quickly.
Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!