We start with a simulated Poisson example where the \(\Theta_i\) are drawn from a chi-squared density with 10 degrees of freedom and the \(X_i|\Theta_i\) are Poisson with expectation \(\Theta_i:\)

\[ \Theta_i \sim \chi^2_{10} \mbox{ and } X_i|\Theta_i \sim \mbox{Poisson}(\Theta_i) \]

The \(\Theta_i\) for this setting, with `N = 1000`

observations can be generated as follows.

```
set.seed(238923) ## for reproducibility
N <- 1000
Theta <- rchisq(N, df = 10)
```

Next, the \(X_i|\Theta_i\), for each of `nSIM = 1000`

simulations can be generated as below.

```
nSIM <- 1000
data <- sapply(seq_len(nSIM), function(x) rpois(n = N, lambda = Theta))
```

We take the discrete set \(\mathcal{T}=(1, 2, \ldots, 32)\) as the \(\Theta\)-space and apply the `deconv`

function in the package `deconvolveR`

to estimate \(g(\theta).\)

```
library(deconvolveR)
tau <- seq(1, 32)
results <- apply(data, 2,
function(x) deconv(tau = tau, X = x, ignoreZero = FALSE,
c0 = 1))
```

The default setting for `deconv`

uses the `Poisson`

family and a natural cubic spline basis of degree 5 as \(Q.\) The regularization parameter for this example (`c0`

) is set to 1. The `ignoreZero`

parameter indicates that this dataset contains zero counts, i.e., there zeros have not been truncated. (In the Shakespeare example below, the counts are of words seen in the canon, and so there is a natural truncation at zero.)

Some warnings are emitted by the `nlm`

routine used for optimization, but they are mostly inconsequential.

Since `deconv`

works on a sample at a time, the result above is a list of lists from which various statistics can be extracted. Below, we construct a table of values for various values of \(\Theta\).

```
g <- sapply(results, function(x) x$stats[, "g"])
mean <- apply(g, 1, mean)
SE.g <- sapply(results, function(x) x$stats[, "SE.g"])
sd <- apply(SE.g, 1, mean)
Bias.g <- sapply(results, function(x) x$stats[, "Bias.g"])
bias <- apply(Bias.g, 1, mean)
gTheta <- pchisq(tau, df = 10) - pchisq(c(0, tau[-length(tau)]), df = 10)
gTheta <- gTheta / sum(gTheta)
simData <- data.frame(theta = tau, gTheta = gTheta,
Mean = mean, StdDev = sd, Bias = bias,
CoefVar = sd / mean)
table1 <- transform(simData,
gTheta = 100 * gTheta,
Mean = 100 * Mean,
StdDev = 100 * StdDev,
Bias = 100 * Bias)
```

The table below summarizes the results for some chosen values of \(\theta .\)

`knitr::kable(table1[c(5, 10, 15, 20, 25), ], row.names=FALSE)`

theta | gTheta | Mean | StdDev | Bias | CoefVar |
---|---|---|---|---|---|

5 | 5.6191465 | 5.4416722 | 0.3635378 | -0.1235865 | 0.0668063 |

10 | 9.1646990 | 9.5316279 | 0.4917673 | 0.2588187 | 0.0515932 |

15 | 4.0946148 | 3.3421426 | 0.3119862 | -0.0683187 | 0.0933492 |

20 | 1.1014405 | 0.9810001 | 0.2246999 | -0.1225092 | 0.2290519 |

25 | 0.2255788 | 0.1496337 | 0.0677919 | 0.0602422 | 0.4530522 |

Although, the coefficient of variation of \(\hat{g}(\theta)\) is still large, the \(g(\theta)\) estimates are reasonable.

We can compare the empirical standard deviations and biases of \(g(\hat{\alpha})\) with the approximation given by the formulas in the paper.

```
library(ggplot2)
library(cowplot)
theme_set(theme_get() +
theme(panel.grid.major = element_line(colour = "gray90",
size = 0.2),
panel.grid.minor = element_line(colour = "gray98",
size = 0.5)))
p1 <- ggplot(data = as.data.frame(results[[1]]$stats)) +
geom_line(mapping = aes(x = theta, y = SE.g), color = "black", linetype = "solid") +
geom_line(mapping = aes(x = simData$theta, y = simData$StdDev), color = "red", linetype = "dashed") +
labs(x = expression(theta), y = "Std. Dev")
p2 <- ggplot(data = as.data.frame(results[[1]]$stats)) +
geom_line(mapping = aes(x = theta, y = Bias.g), color = "black", linetype = "solid") +
geom_line(mapping = aes(x = simData$theta, y = simData$Bias), color = "red", linetype = "dashed") +
labs(x = expression(theta), y = "Bias")
plot_grid(plotlist = list(p1, p2), ncol = 2)
```

The approximation is quite good for the standard deviations, but a little too small for the biases.

Here we are given the word counts for the entire Shakespeare canon in the data set `bardWordCount`

. We assume the \(i\)th distinct word appeared \(X_i \sim Poisson(\Theta_i)\) times in the canon.

```
data(bardWordCount)
str(bardWordCount)
```

`## num [1:100] 14376 4343 2292 1463 1043 ...`

We take the support set \(\mathcal{T}\) for \(\Theta\) to be equally spaced on the log-scale and the sample space for \(\mathcal{X}\) to be \((1,2,\ldots,100).\)

```
lambda <- seq(-4, 4.5, .025)
tau <- exp(lambda)
```

Using a regularization parameter of `c0=2`

we can deconvolve the data to get \(\hat{g}.\)

```
result <- deconv(tau = tau, y = bardWordCount, n = 100, c0=2)
stats <- result$stats
```

The plot below shows the Empirical Bayes deconvoluation estimates for the Shakespeare word counts.

```
ggplot() +
geom_line(mapping = aes(x = lambda, y = stats[, "g"])) +
labs(x = expression(log(theta)), y = expression(g(theta)))
```

The quantity \(R(\alpha)\) in the paper (Efron, Biometrika 2015) can be extracted from the `stats`

list; in this case for a regularization parameter of `c0=2`

we can print its value:

`print(result$S)`

`## [1] 0.005534954`

The `stats`

list contains other estimates quantities as well.

As noted in the paper citing this package, about 44 percent of the total mass of \(\hat{g}\) lies below \(\Theta = 1\), which is an underestimate. This can be corrected for by defining \[ \tilde{g} = c_1\hat{g} / (1 - e^{-\theta_j}), \] where \(c_1\) is the constant that normalizes \(\tilde{g}\).

When there is truncation at zero, as is the case here, the `deconvolveR`

package now returns an additional column in `stats[, "tg"]`

which contains this correction for *thinning*. (The default invocation of `deconv`

assumes zero truncation for the Poisson family, argument `ignoreZero = FALSE`

).

```
d <- data.frame(lambda = lambda, g = stats[, "g"], tg = stats[, "tg"], SE.g = stats[, "SE.g"])
indices <- seq(1, length(lambda), 5)
ggplot(data = d) +
geom_line(mapping = aes(x = lambda, y = g)) +
geom_errorbar(data = d[indices, ],
mapping = aes(x = lambda, ymin = g - SE.g, ymax = g + SE.g),
width = .01, color = "blue") +
labs(x = expression(log(theta)), y = expression(g(theta))) +
ylim(0, 0.006) +
geom_line(mapping = aes(x = lambda, y = tg), linetype = "dashed", color = "red")
```

We can now plot the posterior estimates.

```
gPost <- sapply(seq_len(100), function(i) local({tg <- d$tg * result$P[i, ]; tg / sum(tg)}))
plots <- lapply(c(1, 2, 4, 8), function(i) {
ggplot() +
geom_line(mapping = aes(x = tau, y = gPost[, i])) +
labs(x = expression(theta), y = expression(g(theta)),
title = sprintf("x = %d", i))
})
plots <- Map(f = function(p, xlim) p + xlim(0, xlim), plots, list(6, 8, 14, 20))
plot_grid(plotlist = plots, ncol = 2)
```