Get packages that introduce unique syntax adopted less?
RI have this hypothesis that packages that introduce a unique syntax, or a workflow change, get adopted less by users, even if what these packages do is super useful. I’m going to discuss two examples of packages that I think are really, really useful, but sometimes I wonder how many R users use them, or would use them if they were aware these packages existed. I myself, only use one of them!
The first package is {typed}
which introduces a type
system for R. No more silent conversion to and from types without your knowing! If you don’t
know what a type system is, consider the following:
nchar("100000000")
## [1] 9
you get “9” back, no problem. But if you do:
nchar(100000000)
## [1] 5
You get 5 back… what in the Lord’s name happened here? What happened is that the number 100000000 was converted to a character implicitly. But because of all these 0’s, this is what happened:
as.character(100000000)
## [1] "1e+08"
It gets converted to a character alright, but scientific notation gets used! So yes,
1e+08 is 5 characters long… Ideally nchar()
would at least warn you that
this conversion is happening, or maybe even error. After all, it’s called nchar()
not nnumeric()
or
whatever. (Thanks to
@cararthompson
for this!)
A solution could be to write a wrapper around it:
nchar2 <- function(x, ...){
stopifnot("x is not a character" = is.character(x))
nchar(x, ...)
}
Now this function is safe:
nchar2(123456789)
## [1] Error in nchar2(123456789) : x is not a character
{typed}
makes writing safe functions like this easier.
Using {typed}
you can write the wrapper like this:
library(typed, warn.conflicts = FALSE)
strict_nchar <- ? function(x = ? Character(), ...){
nchar(x, ...)
}
{typed}
introduces ?
(masking the base ?
function to read a function’s docs) allowing you
to set the type the function’s arguments. It’s also possible to set the return type of the function:
strict_nchar <- Integer() ? function(x = ? Character(), ...){
nchar(x, ...)
}
strict_nchar("10000000")
## [1] 8
This is very useful if you want to write safe functions in a very concise and clean way.
The second kind of package I was thinking about are packages like {targets}
, which force users to
structure their projects in a very specific way. I really like {targets}
and have been using it for quite
some time. {targets}
takes inspiration from build automation tools from the software development world
and introduces the concept of build automation in R. If you’re a linux user, you’ve probably dealt with
Makefile
s (especially if you’ve been using linux for more than 10 years), and {targets}
works in
a similar way; by writing a script in which you define targets, these get built in a reproducible way.
If you’d like to see it in action, take a look at this video
of mine. As useful as it is, I can imagine that some potential users will end up not adopting it, because
{targets}
really does things in a very unique and different way. Most people do not know what build
automation tools are, and the cost of adopting {targets}
seems disproportionally higher to its benefits
(but believe me, it is well worth the cost!).
Now here’s the meat of the post: I think that packages like these, even though they’re very useful, get adopted less by users than other packages, that either:
- do not introduce a unique way of doing things;
- for which alternatives are available.
The reason, I believe, is that users do not feel comfortable adopting a unique syntax and way of doing things that impact their code so much, because if these libraries get abandoned, users will need to completely rewrite their scripts. And this is especially true when the two conditions above are not verified.
Take {dplyr}
: one could argue that it introduces both a unique syntax, and a very specific
workflow/way of doing things:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mtcars %>%
filter(am == 0) %>%
group_by(cyl) %>%
summarise(mean_hp = mean(hp))
## # A tibble: 3 × 2
## cyl mean_hp
## <dbl> <dbl>
## 1 4 84.7
## 2 6 115.
## 3 8 194.
But there are alternatives to it (a lot of {dplyr}
functionality is covered by base
functions already,
and there’s also {data.table}
), so IF {dplyr}
would get abandoned by Rstudio (which will never
happen, but let’s assume for the sake of argument), users could switch to {data.table}
. Not so with
more niche packages like the ones discussed above.
Also, even {dplyr}
’s unique syntax making heavy use of %>%
is not so unique anymore, since
the release of R 4.1. A base approach to the above snippet would be:
mtcars |>
subset(am == 0) |>
with(aggregate(hp, by = list(cyl), mean))
## Group.1 x
## 1 4 84.66667
## 2 6 115.25000
## 3 8 194.16667
Before R 4.1, looking at {dplyr}
chains felt like looking at a completely different language than
base R, but now with the introduction of |>
not so anymore. The other thing packages like {dplyr}
have going for them, even when they introduce a completely new syntax, and do not have any alternative
like {ggplot2}
(I don’t consider base
plotting an alternative to {ggplot2}
, because it works
in a completely different way) is that they have big teams and/or companies behind them, like Rstudio.
So users feel much more confident adopting such packages, than if they’re written by a very small
team (sometimes even just one person).
The reason I’m thinking about all this, is because I recently released a package that raises all of the above red flags:
- new syntax (makes heavy use of a new pipe
%>=%
); - forces a new workflow on users;
- developed by a single dude in his free time who isn’t even a programmer (me).
If I was a potential interested user, I honestly don’t know if I’d adopt this package for anything
critical. I might play around with it a bit, but using that in production? What if the author (me)
gets sick of it after a few months/years? Even I, as the author, cannot guarantee today that this
package will still be maintained in 2 years. So users that might have important stuff running which uses my
package are now screwed. I think that the only way for such packages to succeed, is if a sizeable
community gathers around it and if the team of developers expands, and ideally, if it gets backed
by a company (like Rstudio with all their packages, or rOpenSci does for
{targets}
.
To be clear, I am NOT complaining about free and open source software: these problems also exist with proprietary software. If a company builds something and they decide to abandon it, that’s it, it’s over. If there are no alternatives to it, users are screwed just as well. And companies can also go bankrupt or change focus on other more profitable projects. At least with free and open source software, if the author of a package has had enough and decides to not maintain anymore, there is still the possibility of someone else taking it over, and this someone else might be a user! There is also the possibility of running old R version with older versions of packages, even if they’re abandoned, using Docker. So maybe it’s not so bad.
What do you think? I’d be curious to hear your thoughts. Tell me what you think on this github issue I opened.
Oh and by the way, IF you’re using {chronicler}
after reading this, really, thank you.
Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!