The {chronicler} package, an implementation of the logger monad in R
Back in February I discussed a package I was working on which allowed users to add logging to function calls. I named the package {loudly}
but decided to rename it to {chronicler}
.
I have been working on it for the past few weeks, and I think that a CRAN release could happen soon.
Introduction
So what does {chronicler}
do? {chronicler}
allows you do decorate functions, so that they provide enhanced output:
library(chronicler)
## Loading required package: rlang
r_sqrt <- record(sqrt)
a <- r_sqrt(1:5)
Object a
is now an object of class chronicle
. Let’s print a
to the terminal:
a
## ✔ Value computed successfully:
## ---------------
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
##
## ---------------
## This is an object of type `chronicle`.
## Retrieve the value of this object with pick(.c, "value").
## To read the log of this object, call read_log().
as the output says, a
is an object of type chronicle
. Let’s use read_log()
as suggested:
read_log(a)
## [1] "Complete log:"
## [2] "✔ sqrt(1:5) ran successfully at 2022-04-01 21:14:28"
## [3] "Total running time: 0.000240325927734375 secs"
The log tells us how a
was built, and it’s especially useful for objects that are the result of many function calls:
r_sqrt <- record(sqrt)
r_exp <- record(exp)
r_mean <- record(mean)
b <- 1:10 |>
r_sqrt() |>
bind_record(r_exp) |>
bind_record(r_mean)
The log gives all the details:
read_log(b)
## [1] "Complete log:"
## [2] "✔ sqrt(1:10) ran successfully at 2022-04-01 21:14:28"
## [3] "✔ exp(.c$value) ran successfully at 2022-04-01 21:14:28"
## [4] "✔ mean(.c$value) ran successfully at 2022-04-01 21:14:28"
## [5] "Total running time: 0.00820255279541016 secs"
The end result, or what is called value
can be obtained using pick()
(you could also use a$value
):
pick(a, "value")
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
pick(b, "value")
## [1] 11.55345
Composing decorated functions
bind_record()
is used to pass the output from one decorated function to the next:
suppressPackageStartupMessages(
library(dplyr)
)
r_group_by <- record(group_by)
r_select <- record(select)
r_summarise <- record(summarise)
r_filter <- record(filter)
output <- starwars %>%
r_select(height, mass, species, sex) %>%
bind_record(r_group_by, species, sex) %>%
bind_record(r_filter, sex != "male") %>%
bind_record(r_summarise,
mass = mean(mass, na.rm = TRUE)
)
read_log(output)
## [1] "Complete log:"
## [2] "✔ select(.,height,mass,species,sex) ran successfully at 2022-04-01 21:14:28"
## [3] "✔ group_by(.c$value,species,sex) ran successfully at 2022-04-01 21:14:28"
## [4] "✔ filter(.c$value,sex != \"male\") ran successfully at 2022-04-01 21:14:28"
## [5] "✔ summarise(.c$value,mean(mass, na.rm = TRUE)) ran successfully at 2022-04-01 21:14:28"
## [6] "Total running time: 0.11384654045105 secs"
The value can then be saved in a new variable:
(my_df <- pick(output, "value"))
## # A tibble: 9 × 3
## # Groups: species [9]
## species sex mass
## <chr> <chr> <dbl>
## 1 Clawdite female 55
## 2 Droid none 69.8
## 3 Human female 56.3
## 4 Hutt hermaphroditic 1358
## 5 Kaminoan female NaN
## 6 Mirialan female 53.1
## 7 Tholothian female 50
## 8 Togruta female 57
## 9 Twi'lek female 55
You can save the output
object with saveRDS()
and share it; your colleague can then read the log to learn how the object was created.
This package also ships with a dedicated pipe, %>=%
which you can use instead of bind_record()
:
output_pipe <- starwars %>%
r_select(height, mass, species, sex) %>=%
r_group_by(species, sex) %>=%
r_filter(sex != "male") %>=%
r_summarise(mass = mean(mass, na.rm = TRUE))
pick(output_pipe, "value")
## # A tibble: 9 × 3
## # Groups: species [9]
## species sex mass
## <chr> <chr> <dbl>
## 1 Clawdite female 55
## 2 Droid none 69.8
## 3 Human female 56.3
## 4 Hutt hermaphroditic 1358
## 5 Kaminoan female NaN
## 6 Mirialan female 53.1
## 7 Tholothian female 50
## 8 Togruta female 57
## 9 Twi'lek female 55
Condition handling
By default, errors and warnings get caught and composed in the log:
errord_output <- starwars %>%
r_select(height, mass, species, sex) %>=%
r_group_by(species, sx) %>=% # typo, "sx" instead of "sex"
r_filter(sex != "male") %>=%
r_summarise(mass = mean(mass, na.rm = TRUE))
errord_output
## ✖ Value computed unsuccessfully:
## ---------------
## [1] NA
##
## ---------------
## This is an object of type `chronicle`.
## Retrieve the value of this object with pick(.c, "value").
## To read the log of this object, call read_log().
Reading the log tells you which function failed, and with which error message:
read_log(errord_output)
## [1] "Complete log:"
## [2] "✔ select(.,height,mass,species,sex) ran successfully at 2022-04-01 21:14:28"
## [3] "✖ group_by(.c$value,species,sx) ran unsuccessfully with following exception: Must group by variables found in `.data`.\n✖ Column `sx` is not found. at 2022-04-01 21:14:28"
## [4] "✖ filter(.c$value,sex != \"male\") ran unsuccessfully with following exception: no applicable method for 'filter' applied to an object of class \"logical\" at 2022-04-01 21:14:28"
## [5] "✖ summarise(.c$value,mean(mass, na.rm = TRUE)) ran unsuccessfully with following exception: no applicable method for 'summarise' applied to an object of class \"logical\" at 2022-04-01 21:14:28"
## [6] "Total running time: 0.163575887680054 secs"
It is also possible to only capture errors, or catpure errors, warnings and messages using the strict
parameter of record()
# Only errors:
r_sqrt <- record(sqrt, strict = 1)
# Nothing will be captured here, since sqrt(-10) returns a NA and a warning
r_sqrt(-10) |>
read_log()
## Warning in .f(...): NaNs produced
## [1] "Complete log:"
## [2] "✖ sqrt(-10) ran unsuccessfully with following exception: NA at 2022-04-01 21:14:28"
## [3] "Total running time: 0.000255584716796875 secs"
# Errors and warnings:
r_sqrt <- record(sqrt, strict = 2)
# The warning gets captured
r_sqrt(-10) |>
read_log()
## [1] "Complete log:"
## [2] "✖ sqrt(-10) ran unsuccessfully with following exception: NaNs produced at 2022-04-01 21:14:28"
## [3] "Total running time: 0.00019383430480957 secs"
# Errors, warnings and messages
my_f <- function(x){
message("this is a message")
10
}
record(my_f, strict = 3)(10) |>
read_log()
## [1] "Complete log:"
## [2] "✖ my_f(10) ran unsuccessfully with following exception: this is a message\n at 2022-04-01 21:14:28"
## [3] "Total running time: 0.000336408615112305 secs"
Advanced logging
You can provide a function to record()
, which will be evaluated on the output. This makes it possible to, for example, monitor the size of a data frame throughout the pipeline. In the example below I provide dim()
, which will return the dimensions of the data frame, as an argument to record()
:
r_group_by <- record(group_by)
r_select <- record(select, .g = dim)
r_summarise <- record(summarise, .g = dim)
r_filter <- record(filter, .g = dim)
output_pipe <- starwars %>%
r_select(height, mass, species, sex) %>=%
r_group_by(species, sex) %>=%
r_filter(sex != "male") %>=%
r_summarise(mass = mean(mass, na.rm = TRUE))
The $log_df
element of a chronicle
object contains detailed information. In most cases you don’t need to worry about it:
pick(output_pipe, "log_df")
## # A tibble: 4 × 8
## outcome `function` arguments message start_time end_time
## <chr> <chr> <chr> <chr> <dttm> <dttm>
## 1 ✔ Success select ".,heigh… NA 2022-04-01 21:14:28 2022-04-01 21:14:28
## 2 ✔ Success group_by ".c$valu… NA 2022-04-01 21:14:28 2022-04-01 21:14:28
## 3 ✔ Success filter ".c$valu… NA 2022-04-01 21:14:28 2022-04-01 21:14:29
## 4 ✔ Success summarise ".c$valu… NA 2022-04-01 21:14:28 2022-04-01 21:14:29
## # … with 2 more variables: run_time <drtn>, g <list>
but if you want to look at the output of .g
, then you have to grab it and see:
# I coerce it to a data.frame just for the output here on my blog, to make the column `g` readable
as.data.frame(output_pipe$log_df[, c("function", "g")])
## function g
## 1 select 87, 4
## 2 group_by NA
## 3 filter 23, 4
## 4 summarise 9, 3
We can see that the dimension of the dataframe was (87, 4) after the call to select()
, (23, 4) after the call to filter()
and finally (9, 3) after the call to summarise()
.
Monads
This package implements a logger monad. I might talk about monads in the future, but probably in a video; if you don’t know what monads are, don’t worry, no one really knows. Legend has it that to truly understand what monads are you have to have a PhD in computer science and have been born in the former Soviet Union. But to make things simple, you can think of a monad as a way to:
- embelish functions to provide additional output without having to touch the function’s core behaviour
- a way to compose these functions and work with the embelished outputs (also called monadic values)
- monadic values are basically containers that contain the actual value of the function evaluated on its inputs and something else (here, a log)
Monads are quite useful in some programming languanges, like Haskell. Not so much in R, but I think that the logger monad I propose here can be quite useful. So let me know if you find it useful or if you have suggestions!
You can install {chronicler}
from its github repo.