Warning message with perccalc package

Jorge Cimentada

2018-08-24

While the other vignette shows you how to use perccalc appropriately, there are instances where there’s just too few categories to estimate percentiles properly. Imagine estimating a distribution of 1:100 percentiles with only three ordered categories, it just sounds too far fetched.

Let’s load our packages.

library(perccalc)
library(dplyr)
library(ggplot2)

For example, take the survey data on smoking habits.

smoking_data <-
  MASS::survey %>% # you will need to install the MASS package
  as_tibble() %>%
  select(Sex, Smoke, Pulse) %>%
  rename(
    gender = Sex,
    smoke = Smoke,
    pulse_rate = Pulse
  )

The final results is this dataset:

## # A tibble: 237 x 3
##    gender smoke pulse_rate
##    <fct>  <fct>      <int>
##  1 Male   Never         35
##  2 Female Never         40
##  3 Female Never         48
##  4 Male   Never         48
##  5 Female Never         50
##  6 Female Regul         50
##  7 Male   Regul         54
##  8 Male   Never         55
##  9 Male   Never         56
## 10 Male   Never         59
## # ... with 227 more rows

Note that there’s only four categories in the smoke variable. Let’s try to estimate the percentile difference.

smoking_data <-
  smoking_data %>%
  mutate(smoke = factor(smoke,
                        levels = c("Never", "Occas", "Regul", "Heavy"),
                        ordered = TRUE))

perc_diff(smoking_data, smoke, pulse_rate)
## Warning in perc_diff(smoking_data, smoke, pulse_rate): Too few categories
## in categorical variable to estimate the variance-covariance matrix and
## standard errors. Proceeding without estimated standard errors but perhaps
## you should increase the numberof categories
## difference         se 
##   385.1357         NA

perc_diff returns the estimated coefficient but also warns you that it’s difficult for the function to estimate the standard error. This happens similarly for perc_dist.

perc_dist(smoking_data, smoke, pulse_rate) %>%
  head()
## Warning in perc_dist(smoking_data, smoke, pulse_rate): Too few categories
## in categorical variable to estimate the variance-covariance matrix and
## standard errors. Proceeding without estimated standard errors but perhaps
## you should increase the number of categories
## # A tibble: 6 x 2
##   percentile estimate
##        <int>    <dbl>
## 1          1     24.2
## 2          2     47.8
## 3          3     70.8
## 4          4     93.1
## 5          5    115. 
## 6          6    136.