Package 'tabulator' reference manual

Title:	Efficient Tabulation with Stata-Like Output
Description:	Efficient tabulation with Stata-like output. For each unique value of the variable, it shows the number of observations with that value, proportion of observations with that value, and cumulative proportion, in descending order of frequency. Accepts data.table, tibble, or data.frame as input. Efficient with big data: if you give it a data.table, tab() uses data.table syntax.
Authors:	Sean Higgins [aut, cre]
Maintainer:	Sean Higgins <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.0
Built:	2025-02-21 03:14:14 UTC
Source:	https://github.com/skhiggins/tabulator

Efficient quantiles

Description

Produces quantiles of the variables. quantiles shows quantile values. Efficient with big data: if you give it a data.table, quantiles uses data.table syntax.

Usage

quantiles(df, ..., probs = seq(0, 1, 0.1), na.rm = FALSE)
quantiles(df, ..., probs = seq(0, 1, 0.1), na.rm = FALSE)

Arguments

`df`	A data.table, tibble, or data.frame.
`...`	A column or set of columns (without quotation marks).
`probs`	numeric vector of probabilities with values in [0,1].
`na.rm`	logical; if true, any NA and NaN's are removed from x before the quantiles are computed.

Value

Quantile values.

Examples

# data.table
library(data.table)
library(magrittr)
a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE))
a %>% quantiles(varname)

# data.table: look at top 10% in more detail
a %>% quantiles(varname, probs = seq(0.9, 1, 0.01))

# tibble
library(dplyr)
b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE))
b %>% quantiles(varname, na.rm = TRUE)

# data.table
library(data.table)
library(magrittr)
a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE))
a %>% quantiles(varname)

# data.table: look at top 10% in more detail
a %>% quantiles(varname, probs = seq(0.9, 1, 0.01))

# tibble
library(dplyr)
b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE))
b %>% quantiles(varname, na.rm = TRUE)

Produces a tabulation: for each unique group from the variable(s), tab shows the number of observations with that value, proportion of observations with that value, and cumulative proportion, in descending order of frequency. Accepts data.table, tibble, or data.frame as input. Efficient with big data: if you give it a data.table, tab uses data.table syntax.

Usage

tab(df, ..., by, round)
tab(df, ..., by, round)

Arguments

`df`	A data.table, tibble, or data.frame.
`...`	A column or set of columns (without quotation marks).
`by`	A variable by which you want to group observations before tabulating (without quotation marks).
`round`	An integer indicating the number of digits for proportion and cumulative proportion.

Value

Tabulation (frequencies, proportion, cumulative proportion) for each unique value of the variables given in ... from df.

Examples

# data.table
library(data.table)
library(magrittr)
a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE))
a %>% tab(varname)

# tibble
library(dplyr)
b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE))
b %>% tab(varname, round = 1)

# data.frame
c <- data.frame(varname = sample.int(20, size = 1000000, replace = TRUE))
c %>% tab(varname)

# data.table
library(data.table)
library(magrittr)
a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE))
a %>% tab(varname)

# tibble
library(dplyr)
b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE))
b %>% tab(varname, round = 1)

# data.frame
c <- data.frame(varname = sample.int(20, size = 1000000, replace = TRUE))
c %>% tab(varname)

Count distinct categories

Description

Produces a count of unique categories, tabcount shows the number of unique categories for the selected variable. Accepts data.table, tibble, or data.frame as input. Efficient with big data: if you give it a data.table, tabcount uses data.table syntax.

Usage

tabcount(df, ...)
tabcount(df, ...)

Arguments

`df`	A data.table, tibble, or data.frame
`...`	A column or set of columns (without quotation marks)

Value

Count of the number of unique groups formed by the variables given in ... from df.

Examples

# data.table
library(data.table)
library(magrittr)
a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE))
a %>% tabcount(varname)

# tibble
library(dplyr)
b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE))
b %>% tabcount(varname)

# data.table
library(data.table)
library(magrittr)
a <- data.table(varname = sample.int(20, size = 1000000, replace = TRUE))
a %>% tabcount(varname)

# tibble
library(dplyr)
b <- tibble(varname = sample.int(20, size = 1000000, replace = TRUE))
b %>% tabcount(varname)

Package 'tabulator'

Help Index

Efficient quantiles

Description

Usage

Arguments

Value

Examples

Efficient tabulation

Description

Usage

Arguments

Value

Examples

Count distinct categories

Description

Usage

Arguments

Value

Examples