Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebooks, to learn some R and build reproducible analytical pipelines..
You can also watch my youtube channel or find the slides to the talks I've given here.
Buy me a coffee, my kids don't let me sleep.

How to write code that returns (Rmarkdown) code

R

One of the most useful aspects of using a programming language instead of… well, not using a programming language, is that you can write code in a way that minimizes, and ideally, eliminates the need to repeat yourself.

For instance, you can write a function to show you a frequency table, like so:

suppressMessages(library(dplyr))

create_table <- function(dataset, var){

  var <- enquo(var)

  dataset %>%
    count(!!var) %>%
    knitr::kable()

}

And can now get some fancy looking tables by simply writing:

create_table(mtcars, cyl)
cyl n
4 11
6 7
8 14

If I want such tables for hundreds of columns, I can use this function and loop over the columns and not have to write the code inside the body of the function over and over again. You’ll notice that the function create_table() makes use of some advanced programming techniques I have discussed here. There’s also an alternative way of programming with {dplyr}, using the {{}} construct I discussed here, but I couldn’t get what I’m going to show you here to work with {{}}.

Recently, I had to create a Rmarkdown document with many sections, where each section title was a question from a survey and the content was a frequency table. I wanted to write a fuction that would create a section with the right question title, and then show the table, and I wanted to then call this function over all the questions from the survey and have my document automatically generated.

The result should look like this, but it would be a PDF instead of HTML.

Let’s first load the data and see how it looks like:

library(dplyr)
library(purrr)
library(readr)

suppressMessages(
  survey_data <- read_csv(
    "https://gist.githubusercontent.com/b-rodrigues/0c2249dec5a9c9477e0d1ad9964a1340/raw/873bcc7532b8bad613235f029884df1d0b947c90/survey_example.csv"
  )
)

glimpse(survey_data)
## Rows: 100
## Columns: 4
## $ `Random question?`                         <chr> "no", "yes", "yes", "yes", …
## $ `Copy of Random question?`                 <chr> "yes", "yes", "no", "yes", …
## $ `Copy of Copy of Random question?`         <chr> "yes", "no", "no", "yes", "…
## $ `Copy of Copy of Copy of Random question?` <chr> "yes", "yes", "no", "yes", …

Each column name is the question, and each row is one answer to the survey question. To create the document I showed above, you’d probably write something like this:


## Random question?

` ``{r}

create_table(survey_data, `Random question?`)

` ``

## Copy of Random question?

` ``{r}

create_table(survey_data, `Copy of Random question?`)

` ``

## Copy of Copy of Random question?

` ``{r}

create_table(survey_data, `Copy of Copy of Random question?`)

` ``

## Copy of Copy of Copy of Random question?

` ``{r}

create_table(survey_data, `Copy of Copy of Copy of Random question?`)

` ``

As you can see, this gets tedious very quickly, especially if you have 100’s of variables. So how to not repeat yourself? The solution has two steps; first you should try to automate what you have as much as possible. Ideally, you don’t want to have to write the complete question every time. So first, let’s replace the questions by simpler variable names:

questions <- colnames(survey_data)

codes <- paste0("var_", seq(1, length(questions)))

lookup <- bind_cols("codes" = codes, "questions" = questions)

colnames(survey_data) <- codes

lookup is a data frame with the questions and their respective codes:

lookup
## tibble [4, 2] 
## codes     chr var_1 var_2 var_3 var_4
## questions chr Random question? Copy of Random question? Cop~

and our data now has simpler variable names:

glimpse(survey_data)
## Rows: 100
## Columns: 4
## $ var_1 <chr> "no", "yes", "yes", "yes", "no", NA, "no", NA, "no", "no", "no",…
## $ var_2 <chr> "yes", "yes", "no", "yes", "no", "yes", "yes", NA, "yes", NA, "n…
## $ var_3 <chr> "yes", "no", "no", "yes", "yes", "no", "no", "yes", "no", "yes",…
## $ var_4 <chr> "yes", "yes", "no", "yes", "yes", "no", "no", "yes", "no", "no",…

Doing this allows us to replace the source code of our Rmarkdown like so:

## `r lookup$questions[grepl("var_1", lookup$codes)]`

` ``{r}
create_table(survey_data, var_1)
` ``

This already makes things easier, as now you only have to change var_1 to var_2 to var_3… the inline code gets executed and the right title (the question text) appears. But how to go further? I don’t want to have to copy and paste this and change var_1 to var_2 etc… So the second step of the two-step solution is to use a function called knitr_expand() described here. The idea of knitr::knitr_expand() is that it uses some Rmd source as a template, and also allows the user to define some variables that will be replaced at compile time. Simple examples are available here. I want to build upon that, because I need to pass my variable (in this case var_1 for instance) to my function create_table().

The solution is to write another function that uses knitr::knitr_expand(). This is how it could look like:

create_table <- function(dataset, var){

  dataset %>%
    count(!!var) %>%
    knitr::kable()

}


return_section <- function(var){

  a <- knitr::knit_expand(text = c("## {{question}}",   create_table(survey_data, var)),
                          question =  lookup$questions[grepl(quo_name(var), lookup$codes)])

  cat(a, sep = "\n")
}

I needed to edit create_table() a little bit, and remove the line var <- enquo(var). This is because now, I won’t be passing a variable down to the function, but a quosure, and there is a very good reason for it, you’ll see. return_section() makes use of knitr_expand(), and the text = argument is the template that will get expanded. {{question}} will get replaced by the variable I defined which is the code I wrote above to automatically get the question text. Finally, var will get replaced by the variable I pass to the function.

First, let’s get it running on one single variable:

return_section(quo(var_1))
## ## Random question?
## |var_1 |  n|
## |:-----|--:|
## |no    | 40|
## |yes   | 44|
## |NA    | 16|

As you see, I had to use quo(var_1) and not only var_1. But apart from this, the function seems to work well. Putting this in an Rmarkdown document would create a section with the question as the text of the section and a frequency table as the body. I could now copy and paste this and only have to change var_1. But I don’t want to have to copy and paste! So the idea would be to loop the function over a list of variables.

I have such a list already:

codes
## [1] "var_1" "var_2" "var_3" "var_4"

But it’s not a list of quosures, but a list of strings, and this is not going to work (it will return an error):

walk(codes, return_section)

(I’m using walk() instead of map() because return_section() doesn’t return an object, but only shows something on screen. This is called a side effect, and walk() allows you to loop properly over functions that only return side effects).

The problem I have now is to convert strings to quosures. This is possible using rlang::sym():

sym_codes <- map(codes, sym)

And now I’m done:

walk(sym_codes, return_section)
## ## Random question?
## |var_1 |  n|
## |:-----|--:|
## |no    | 40|
## |yes   | 44|
## |NA    | 16|
## ## Copy of Random question?
## |var_2 |  n|
## |:-----|--:|
## |no    | 52|
## |yes   | 32|
## |NA    | 16|
## ## Copy of Copy of Random question?
## |var_3 |  n|
## |:-----|--:|
## |no    | 46|
## |yes   | 47|
## |NA    |  7|
## ## Copy of Copy of Copy of Random question?
## |var_4 |  n|
## |:-----|--:|
## |no    | 48|
## |yes   | 42|
## |NA    | 10|

Putting this in an Rmarkdown source create a PDF (or Word, or HTML) document with one section per question, and without have to do copy-pasting which is quite error-prone. Here is the final Rmarkdown file. You’ll notice that the last chunk has the option results = 'asis', which is needed for this trick to work.

Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!

Buy me an EspressoBuy me an Espresso