Get basic summary statistics for all the variables in a data frame
RI have added a new function to my {brotools}
package, called describe()
,
which takes a data frame as an argument, and returns another data frame with descriptive
statistics. It is very much inspired by the {skmir}
package but also by
assist::describe()
(click
on the packages to be redirected to the respective Github repos)
but I wanted to write my own for two reasons: first, as an exercice, and second
I really only needed the function skim_to_wide()
from {skimr}
. So instead of installing a
whole package for a single function, I decided to write my own (since I use {brotools}
daily).
Below you can see it in action:
library(dplyr)
data(starwars)
brotools::describe(starwars)
## # A tibble: 10 x 13
## variable type nobs mean sd mode min max q25 median q75
## <chr> <chr> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 birth_ye… Nume… 87 87.6 155. 19 8 896 35 52 72
## 2 height Nume… 87 174. 34.8 172 66 264 167 180 191
## 3 mass Nume… 87 97.3 169. 77 15 1358 55.6 79 84.5
## 4 eye_color Char… 87 NA NA blue NA NA NA NA NA
## 5 gender Char… 87 NA NA male NA NA NA NA NA
## 6 hair_col… Char… 87 NA NA blond NA NA NA NA NA
## 7 homeworld Char… 87 NA NA Tatoo… NA NA NA NA NA
## 8 name Char… 87 NA NA Luke … NA NA NA NA NA
## 9 skin_col… Char… 87 NA NA fair NA NA NA NA NA
## 10 species Char… 87 NA NA Human NA NA NA NA NA
## # ... with 2 more variables: n_missing <int>, n_unique <int>
As you can see, the object that is returned by describe()
is a tibble
.
For now, this function does not handle dates, but it’s in the pipeline.
You can also only describe certain columns:
brotools::describe(starwars, height, mass, name)
## # A tibble: 3 x 13
## variable type nobs mean sd mode min max q25 median q75
## <chr> <chr> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 height Numer… 87 174. 34.8 172 66 264 167 180 191
## 2 mass Numer… 87 97.3 169. 77 15 1358 55.6 79 84.5
## 3 name Chara… 87 NA NA Luke S… NA NA NA NA NA
## # ... with 2 more variables: n_missing <int>, n_unique <int>
If you want to try it out,
you can install {brotools}
from Github:
devtools::install_github("b-rodrigues/brotools")
If you found this blog post useful, you might want to follow me on twitter for blog post updates.