statsExpressions
: Expressions with statistical detailsPackage | Status | Usage | GitHub | References |
---|---|---|---|---|
statsExpressions
provides statistical processing backend for the ggstatsplot
package, which combines ggplot2
visualizations with expressions containing results from statistical tests. statsExpressions
contains all functions needed to create these expressions.
To get the latest, stable CRAN
release:
You can get the development version of the package from GitHub
. To see what new changes (and bug fixes) have been made to the package since the last release on CRAN
, you can check the detailed log of changes here: https://indrajeetpatil.github.io/statsExpressions/news/index.html
If you are in hurry and want to reduce the time of installation, prefer-
# needed package to download from GitHub repo
utils::install.packages(pkgs = "remotes")
# downloading the package from GitHub
remotes::install_github(
repo = "IndrajeetPatil/statsExpressions", # package path on GitHub
dependencies = FALSE, # assumes you have already installed needed packages
quick = TRUE # skips docs, demos, and vignettes
)
If time is not a constraint-
remotes::install_github(
repo = "IndrajeetPatil/statsExpressions", # package path on GitHub
dependencies = TRUE, # installs packages which statsExpressions depends on
upgrade_dependencies = TRUE # updates any out of date dependencies
)
If you want to cite this package in a scientific journal or in any other context, run the following code in your R
console:
To see the documentation relevant for the development version of the package, see the dedicated website for statsExpressions
, which is updated after every new commit: https://indrajeetpatil.github.io/statsExpressions/.
Currently, it supports only the most common types of statistical tests. Specifically, parametric, non-parametric, robust, and bayesian versions of:
The table below summarizes all the different types of analyses currently supported in this package-
Description | Parametric | Non-parametric | Robust | Bayes Factor |
---|---|---|---|---|
Between group/condition comparisons | Yes | Yes | Yes | Yes |
Within group/condition comparisons | Yes | Yes | Yes | Yes |
Distribution of a numeric variable | Yes | Yes | Yes | Yes |
Correlation between two variables | Yes | Yes | Yes | Yes |
Association between categorical variables | Yes | NA |
NA |
Yes |
Equal proportions for categorical variable levels | Yes | NA |
NA |
Yes |
Random-effects meta-analysis | Yes | No | Yes | Yes |
For all statistical test expressions, the default template abides by the APA gold standard for statistical reporting. For example, here are results from Yuen’s test for trimmed means (robust t-test):
Here is a summary table of all the statistical tests currently supported across various functions:
Functions | Type | Test | Effect size | 95% CI available? |
---|---|---|---|---|
expr_anova_parametric (2 groups) |
Parametric | Student’s and Welch’s t-test | Cohen’s d, Hedge’s g | |
expr_anova_parametric (> 2 groups) |
Parametric | Fisher’s and Welch’s one-way ANOVA | ||
expr_anova_nonparametric (2 groups) |
Non-parametric | Mann-Whitney U-test | r | |
expr_anova_nonparametric (> 2 groups) |
Non-parametric | Kruskal-Wallis Rank Sum Test | ||
expr_anova_robust (2 groups) |
Robust | Yuen’s test for trimmed means | ||
expr_anova_robust (> 2 groups) |
Robust | Heteroscedastic one-way ANOVA for trimmed means | ||
expr_anova_parametric (2 groups) |
Parametric | Student’s t-test | Cohen’s d, Hedge’s g | |
expr_anova_parametric (> 2 groups) |
Parametric | Fisher’s one-way repeated measures ANOVA | ||
expr_anova_nonparametric (2 groups) |
Non-parametric | Wilcoxon signed-rank test | r | |
expr_anova_nonparametric (> 2 groups) |
Non-parametric | Friedman rank sum test | ||
expr_anova_robust (2 groups) |
Robust | Yuen’s test on trimmed means for dependent samples | ||
expr_anova_robust (> 2 groups) |
Robust | Heteroscedastic one-way repeated measures ANOVA for trimmed means | ||
expr_contingency_tab (unpaired) |
Parametric | Cramér’s V | ||
expr_contingency_tab (paired) |
Parametric | McNemar’s test | Cohen’s g | |
expr_contingency_tab |
Parametric | One-sample proportion test | Cramér’s V | |
expr_corr_test |
Parametric | Pearson’s r | r | |
expr_corr_test |
Non-parametric | |||
expr_corr_test |
Robust | Percentage bend correlation | r | |
expr_t_onesample |
Parametric | One-sample t-test | Cohen’s d, Hedge’s g | |
expr_t_onesample |
Non-parametric | One-sample Wilcoxon signed rank test | r | |
expr_t_onesample |
Robust | One-sample percentile bootstrap | robust estimator | |
expr_meta_parametric |
Parametric | Meta-analysis via random-effects models | ||
expr_meta_robust |
Robust | Meta-analysis via robust random-effects models |
A list of primary functions in this package can be found at the package website: https://indrajeetpatil.github.io/statsExpressions/reference/index.html
Following are few examples of how these functions can be used.
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# create fake data
df <- data.frame(x = rnorm(1000, 0.1, 0.5))
# creating a histogram plot
p <- ggplot(df, aes(x)) +
geom_histogram(alpha = 0.5) +
geom_vline(xintercept = mean(df$x), color = "red")
# adding a caption with a non-parametric one-sample test
p + labs(
title = "One-Sample Wilcoxon Signed Rank Test",
caption = expr_t_onesample(df, x, type = "nonparametric")
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
# setup
set.seed(123)
library(ggplot2)
library(hrbrthemes)
# create a plot
p <-
ggplot(ToothGrowth, aes(supp, len)) +
geom_boxplot() +
theme_ipsum_rc()
# adding a subtitle with
p + labs(
title = "Two-Sample Welch's t-test",
subtitle = expr_t_parametric(ToothGrowth, supp, len)
)
Let’s say we want to check differences in weight of the vehicle based on number of cylinders in the engine and wish to carry out Welch’s ANOVA:
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# create a boxplot
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot() +
labs(
title = "Welch's ANOVA",
subtitle = expr_anova_parametric(iris, Species, Sepal.Length, messages = FALSE)
)
In case you change your mind and now want to carry out a robust ANOVA instead. Also, let’s use a different kind of a visualization:
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
library(ggridges)
# create a ridgeplot
p <-
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(
jittered_points = TRUE, quantile_lines = TRUE,
scale = 0.9, vline_size = 1, vline_color = "red",
position = position_raincloud(adjust_vlines = TRUE)
)
# create an expression containing details from the relevant test
results <- expr_anova_robust(iris, Species, Sepal.Length, messages = FALSE)
# display results on the plot
p + labs(
title = "A heteroscedastic one-way ANOVA for trimmed means",
subtitle = results
)
Let’s look at another example where we want to run correlation analysis:
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# create a ridgeplot
p <-
ggplot(mtcars, aes(x = mpg, y = wt)) +
geom_point() +
geom_smooth(method = "lm")
# create an expression containing details from the relevant test
results <- expr_corr_test(mtcars, mpg, wt, type = "nonparametric")
# display results on the plot
p + labs(
title = "Spearman's rank correlation coefficient",
subtitle = results
)
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# data
df <- as.data.frame(table(mpg$class))
colnames(df) <- c("class", "freq")
# basic pie chart
p <-
ggplot(df, aes(x = "", y = freq, fill = factor(class))) +
geom_bar(width = 1, stat = "identity") +
theme(
axis.line = element_blank(),
plot.title = element_text(hjust = 0.5)
)
# cleaning up the chart and adding results from one-sample proportion test
p +
coord_polar(theta = "y", start = 0) +
labs(
fill = "class",
x = NULL,
y = NULL,
title = "Pie Chart of class",
subtitle = expr_onesample_proptest(df, class, counts = freq),
caption = "One-sample goodness of fit proportion test"
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
You can also use these function to get the expression in return without having to display them in plots:
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# Pearson's chi-squared test of independence
expr_contingency_tab(mtcars, am, cyl, messages = FALSE)
#> paste(NULL, chi["Pearson"]^2, "(", "2", ") = ", "8.74", ", ",
#> italic("p"), " = ", "0.013", ", ", widehat(italic("V"))["Cramer"],
#> " = ", "0.46", ", CI"["95%"], " [", "0.08", ", ", "0.75",
#> "]", ", ", italic("n")["obs"], " = ", 32L)
# setup
set.seed(123)
library(metaviz)
library(ggplot2)
# rename columns to `statsExpressions` conventions
df <- dplyr::rename(mozart, estimate = d, std.error = se)
# meta-analysis forest plot with results random-effects meta-analysis
viz_forest(
x = mozart[, c("d", "se")],
study_labels = mozart[, "study_name"],
xlab = "Cohen's d",
variant = "thick",
type = "cumulative"
) + # use `statsExpressions` to create expression containing results
labs(
title = "Meta-analysis of Pietschnig, Voracek, and Formann (2010) on the Mozart effect",
subtitle = expr_meta_parametric(df, k = 3)
) +
theme(text = element_text(size = 12))
ggstatsplot
Note that these functions were initially written to display results from statistical tests on ready-made ggplot2
plots implemented in ggstatsplot
.
For detailed documentation, see the package website: https://indrajeetpatil.github.io/ggstatsplot/
Here is an example from ggstatsplot
of what the plots look like when the expressions are displayed in the subtitle-
As the code stands right now, here is the code coverage for all primary functions involved: https://codecov.io/gh/IndrajeetPatil/statsExpressions/tree/master/R
I’m happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. I personally prefer using the GitHub
issues system over trying to reach out to me in other ways (personal e-mail, Twitter, etc.). Pull Requests for contributions are encouraged.
Here are some simple ways in which you can contribute (in the increasing order of commitment):
Read and correct any inconsistencies in the documentation
Raise issues about bugs or wanted features
Review code
Add new functionality (in the form of new plotting functions or helpers for preparing subtitles)
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.