Title: | Efficient Tabulation with Stata-Like Output |
---|---|
Description: | Efficient tabulation with Stata-like output. For each unique value of the variable, it shows the number of observations with that value, proportion of observations with that value, and cumulative proportion, in descending order of frequency. Accepts data.table, tibble, or data.frame as input. Efficient with big data: if you give it a data.table, tab() uses data.table syntax. |
Authors: | Sean Higgins [aut, cre] |
Maintainer: | Sean Higgins <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2025-02-21 03:14:14 UTC |
Source: | https://github.com/skhiggins/tabulator |
Produces quantiles of the variables.
quantiles
shows quantile values.
Efficient with big data: if you give it a data.table
,
quantiles
uses data.table
syntax.
quantiles(df, ..., probs = seq(0, 1, 0.1), na.rm = FALSE)
quantiles(df, ..., probs = seq(0, 1, 0.1), na.rm = FALSE)
df |
A data.table, tibble, or data.frame. |
... |
A column or set of columns (without quotation marks). |
probs |
numeric vector of probabilities with values in [0,1]. |
na.rm |
logical; if true, any NA and NaN's are removed from x before the quantiles are computed. |
Quantile values.
# data.table library(data.table) library(magrittr) a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE)) a %>% quantiles(varname) # data.table: look at top 10% in more detail a %>% quantiles(varname, probs = seq(0.9, 1, 0.01)) # tibble library(dplyr) b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE)) b %>% quantiles(varname, na.rm = TRUE)
# data.table library(data.table) library(magrittr) a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE)) a %>% quantiles(varname) # data.table: look at top 10% in more detail a %>% quantiles(varname, probs = seq(0.9, 1, 0.01)) # tibble library(dplyr) b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE)) b %>% quantiles(varname, na.rm = TRUE)
Produces a tabulation: for each unique group from the variable(s),
tab
shows the number of
observations with that value, proportion of observations with that
value, and cumulative proportion, in descending order of frequency.
Accepts data.table, tibble, or data.frame as input.
Efficient with big data: if you give it a data.table
,
tab
uses data.table
syntax.
tab(df, ..., by, round)
tab(df, ..., by, round)
df |
A data.table, tibble, or data.frame. |
... |
A column or set of columns (without quotation marks). |
by |
A variable by which you want to group observations before tabulating (without quotation marks). |
round |
An integer indicating the number of digits for proportion and cumulative proportion. |
Tabulation (frequencies, proportion, cumulative proportion) for each unique value of the variables given in ...
from df
.
# data.table library(data.table) library(magrittr) a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE)) a %>% tab(varname) # tibble library(dplyr) b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE)) b %>% tab(varname, round = 1) # data.frame c <- data.frame(varname = sample.int(20, size = 1000000, replace = TRUE)) c %>% tab(varname)
# data.table library(data.table) library(magrittr) a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE)) a %>% tab(varname) # tibble library(dplyr) b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE)) b %>% tab(varname, round = 1) # data.frame c <- data.frame(varname = sample.int(20, size = 1000000, replace = TRUE)) c %>% tab(varname)
Produces a count of unique categories,
tabcount
shows the number of
unique categories for the selected variable.
Accepts data.table, tibble, or data.frame as input.
Efficient with big data: if you give it a data.table
,
tabcount
uses data.table
syntax.
tabcount(df, ...)
tabcount(df, ...)
df |
A data.table, tibble, or data.frame |
... |
A column or set of columns (without quotation marks) |
Count of the number of unique groups formed by the variables given in ...
from df
.
# data.table library(data.table) library(magrittr) a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE)) a %>% tabcount(varname) # tibble library(dplyr) b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE)) b %>% tabcount(varname)
# data.table library(data.table) library(magrittr) a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE)) a %>% tabcount(varname) # tibble library(dplyr) b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE)) b %>% tabcount(varname)