This vignette illustrates the standard use of the `PLNLDA`

function and the methods accompanying the R6 Classes `PLNLDA`

and `PLNLDAfit`

.

The packages required for the analysis are **PLNmodels**
plus some others for data manipulation and representation:

`library(PLNmodels)`

We illustrate our point with the trichoptera data set, a full description of which can be found in the corresponding vignette. Data preparation is also detailed in the specific vignette.

```
data(trichoptera)
<- prepare_data(trichoptera$Abundance, trichoptera$Covariate) trichoptera
```

The `trichoptera`

data frame stores a matrix of counts
(`trichoptera$Abundance`

), a matrix of offsets
(`trichoptera$Offset`

) and some vectors of covariates
(`trichoptera$Wind`

, `trichoptera$Temperature`

,
etc.) In the following, we’re particularly interested in the
`trichoptera$Group`

**discrete** covariate which
corresponds to disjoint time spans during which the catching took place.
The correspondence between group label and time spans is:

Label | Number.of.Consecutive.Nights | Date |
---|---|---|

1 | 12 | June 59 |

2 | 5 | June 59 |

3 | 5 | June 59 |

4 | 4 | June 59 |

5 | 4 | July 59 |

6 | 1 | June 59 |

7 | 3 | June 60 |

8 | 4 | June 60 |

9 | 5 | June 60 |

10 | 4 | June 60 |

11 | 1 | June 60 |

12 | 1 | July 60 |

In the vein of Fisher (1936) and Rao (1948), we introduce a multi-class LDA model for multivariate count data which is a variant of the Poisson Lognormal model of Aitchison and Ho (1989) (see the PLN vignette as a reminder). Indeed, it can viewed as a PLN model with a discrete group structure in the latent Gaussian space.

This PLN-LDA model can be written in a hierarchical framework where a sample of \(p\)-dimensional observation vectors \(\mathbf{Y}_i\) is related to some \(p\)-dimensional vectors of latent variables \(\mathbf{Z}_i\) and a discrete structure with \(K\) groups in the following way: \[\begin{equation} \begin{array}{rcl} \text{group structure } & \mathbf{\mu}_i = \mu_{g_i} & g_i \in \{1, \dots, K\} \\ \text{latent space } & \mathbf{Z}_i \quad \text{indep.} & \mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol{\Sigma}) & \\ \text{observation space } & Y_{ij} | Z_{ij} \quad \text{indep.} & Y_{ij} | Z_{ij} \sim \mathcal{P}\left(\exp\{Z_{ij}\}\right) \end{array} \end{equation}\] where \(g_i\) denotes the group sample \(i\) belongs to.

The different parameters \({\boldsymbol\mu}_k \in\mathbb{R}^p\) corresponds to the group-specific main effects and the variance matrix \(\boldsymbol{\Sigma}\) is shared among groups. An equivalent way of writing this model is the following: \[\begin{equation} \begin{array}{rcl} \text{latent space } & \mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i,\boldsymbol\Sigma) & \boldsymbol{\mu}_i = \mathbf{g}_i^\top \mathbf{M} \\ \text{observation space } & Y_{ij} | Z_{ij} \quad \text{indep.} & Y_{ij} | Z_{ij} \sim \mathcal{P}\left(\exp\{Z_{ij}\}\right), \end{array} \end{equation}\] where, with a slight abuse of notation, \(\mathbf{g}_i\) is a group-indicator vector of length \(K\) (\(g_{ik} = 1 \Leftrightarrow g_i = k\)) and \(\mathbf{M} = [\boldsymbol{\mu}_1^\top, \dots, \boldsymbol{\mu}_K^\top]^\top\) is a \(K \times p\) matrix collecting the group-specific main effects.

Just like PLN, PLN-LDA generalizes to a formulation close to a multivariate generalized linear model where the main effect is due to a linear combination of the discrete group structure, \(d\) covariates \(\mathbf{x}_i\) and a vector \(\mathbf{o}_i\) of \(p\) offsets in sample \(i\). The latent layer then reads \[\begin{equation} \mathbf{Z}_i \sim \mathcal{N}({\mathbf{o}_i + \mathbf{g}_i^\top \mathbf{M} + \mathbf{x}_i^\top\mathbf{B}},\boldsymbol\Sigma) \end{equation}\] where \(\mathbf{B}\) is a \(d\times p\) matrix of regression parameters.

Given:

- a new observation \(\mathbf{Y}\) with associated offset \(\mathbf{o}\) and covariates \(\mathbf{x}\)
- a model with estimated parameters \(\hat{\boldsymbol{\Sigma}}\), \(\hat{\mathbf{B}}\), \(\hat{\mathbf{M}}\) and group counts \((n_1, \dots, n_K)\)

We can predict the observation’s group using Bayes rule as follows: for \(k \in {1, \dots, K}\), compute \[\begin{equation} \begin{aligned} f_k(\mathbf{Y}) & = p(\mathbf{Y} | \mathbf{g} = k, \mathbf{o}, \mathbf{x}, \hat{\mathbf{B}}, \hat{\boldsymbol{\Sigma}}) \\ & = \boldsymbol{\Phi}_{PLN}(\mathbf{Y}; \mathbf{o} + \boldsymbol{\mu}_k + \mathbf{x}^\top \hat{\mathbf{B}}, \hat{\boldsymbol{\Sigma}}) \\ p_k & = \frac{n_k}{\sum_{k' = 1}^K n_{k'}} \end{aligned} \end{equation}\] where \(\boldsymbol{\Phi}_{PLN}(\bullet; \boldsymbol{\mu}, \boldsymbol{\Sigma})\) is the density function of a PLN distribution with parameters \((\boldsymbol{\mu}, \boldsymbol{\Sigma})\). \(f_k(\mathbf{Y})\) and \(p_k\) are respectively plug-in estimates of (i) the probability of observing counts \(\mathbf{Y}\) in a sample from group \(k\) and (ii) the probability that a sample originates from group \(k\).

The posterior probability \(\hat{\pi}_k(\mathbf{Y})\) that observation \(\mathbf{Y}\) belongs to group \(k\) and most likely group \(\hat{k}(\mathbf{Y})\) can thus be defined as \[\begin{equation} \begin{aligned} \hat{\pi}_k(\mathbf{Y}) & = \frac{p_k f_k(\mathbf{Y})}{\sum_{k' = 1}^K p_{k'} f_{k'}(\mathbf{Y})} \\ \hat{k}(\mathbf{Y}) & = \underset{k \in \{1, \dots, K\}}{\arg\max} \hat{\pi}_k(\mathbf{Y}) \end{aligned} \end{equation}\]

Classification and prediction are the main objectives in (PLN-)LDA.
To reach this goal, we first need to estimate the model parameters.
Inference in PLN-LDA focuses on the group-specific main effects \(\mathbf{M}\), the regression parameters
\(\mathbf{B}\) and the covariance
matrix \(\boldsymbol\Sigma\).
Technically speaking, we can treat \(\mathbf{g}_i\) as a discrete covariate and
estimate \([\mathbf{M}, \mathbf{B}]\)
using the same strategy as for the standard **PLN** model.
Briefly, we adopt a variational strategy to approximate the
log-likelihood function and optimize the consecutive variational
surrogate of the log-likelihood with a gradient-ascent-based approach.
To this end, we rely on the CCSA algorithm of Svanberg (2002) implemented in the C++ library
(Johnson 2011), which we link to the
package.

In the package, the PLN-LDA model is adjusted with the function
`PLNLDA`

, which we review in this section. This function
adjusts the model and stores it in a object of class
`PLNLDAfit`

which inherits from the class
`PLNfit`

, so we strongly recommend the reader to be somehow
comfortable with `PLN`

and `PLNfit`

before using
`PLNLDA`

(see the PLN vignette).

We start by adjusting the above model to the Trichoptera data set. We
use `Group`

, the catching time spans, as a discrete structure
and use log as an offset to capture differences in sampling luck.

The model can be fitted with the function `PLNLDA`

as
follows:

```
<- PLNLDA(Abundance ~ 0 + offset(log(Offset)),
myLDA_nocov grouping = Group,
data = trichoptera)
```

```
##
## Performing discriminant Analysis...
## DONE!
```

Note that `PLNLDA`

uses the standard `formula`

interface, like every other model in the **PLNmodels**
package.

`PLNLDAfit`

The `myLDA_nocov`

variable is an `R6`

object
with class `PLNLDAfit`

, which comes with a couple of methods.
The most basic is the `show/print`

method, which sends a
brief summary of the estimation process and available methods:

` myLDA_nocov`

```
## Linear Discriminant Analysis for Poisson Lognormal distribution
## ==================================================================
## nb_param loglik BIC ICL
## 391 -799.864 -1560.715 -1162.217
## ==================================================================
## * Useful fields
## $model_par, $latent, $latent_pos, $var_par, $optim_par
## $loglik, $BIC, $ICL, $loglik_vec, $nb_param, $criteria
## * Useful S3 methods
## print(), coef(), sigma(), vcov(), fitted()
## predict(), predict_cond(), standard_error()
## * Additional fields for LDA
## $percent_var, $corr_map, $scores, $group_means
## * Additional S3 methods for LDA
## plot.PLNLDAfit(), predict.PLNLDAfit()
```

Comprehensive information about `PLNLDAfit`

is available
via `?PLNLDAfit`

.

The user can easily access several fields of the
`PLNLDAfit`

object using `S3`

methods, the most
interesting ones are

- the \(p \times p\) covariance matrix \(\hat{\boldsymbol{\Sigma}}\):

`sigma(myLDA_nocov) %>% corrplot::corrplot(is.corr = FALSE)`

- the regression coefficient matrix \(\hat{\mathbf{B}}\) (in this case
`NULL`

as there are no covariates)

`coef(myLDA_nocov)`

`## NULL`

- the \(p \times K\) matrix of group means \(\mathbf{M}\)

`$group_means %>% head() %>% knitr::kable(digits = 2) myLDA_nocov`

grouping1 | grouping2 | grouping3 | grouping4 | grouping5 | grouping6 | grouping7 | grouping8 | grouping9 | grouping10 | grouping11 | grouping12 |
---|---|---|---|---|---|---|---|---|---|---|---|

-24.25 | -6.82 | -23.82 | -22.65 | -25.60 | -26.18 | -24.49 | -23.64 | -23.71 | -5.65 | -3.47 | -22.64 |

-24.23 | -24.97 | -5.67 | -22.62 | -7.44 | -26.15 | -24.47 | -23.61 | -23.68 | -23.75 | -21.59 | -4.47 |

-2.38 | -3.96 | -2.36 | -2.92 | -6.49 | -5.54 | -2.69 | -2.03 | -2.71 | -2.65 | -21.70 | -3.85 |

-24.25 | -25.00 | -5.69 | -4.52 | -25.52 | -26.22 | -24.51 | -5.51 | -5.59 | -4.91 | -21.66 | -22.68 |

-0.28 | -0.26 | -0.62 | -1.09 | -0.61 | -0.11 | -0.66 | -0.80 | -0.39 | -0.45 | -0.61 | -1.16 |

-4.00 | -3.32 | -2.93 | -4.47 | -6.72 | -8.00 | -2.85 | -3.06 | -5.53 | -3.99 | -21.57 | -22.59 |

The `PLNLDAfit`

class also benefits from two important
methods: `plot`

and `predict`

.

`plot`

methodThe `plot`

methods provides easy to interpret graphics
which reveals here that the groups are well separated:

`plot(myLDA_nocov)`

By default, `plot`

shows the first 3 axis of the LDA when
there are 4 or more groups and uses special representations for the edge
cases of 3 or less groups.

`ggplot2`

-savvy users who want to make their own
representations can extracts the \(n \times
(K-1)\) matrix of sample scores from the `PLNLDAfit`

object …

`$scores %>% head %>% knitr::kable(digits = 2) myLDA_nocov`

LD1 | LD2 | LD3 | LD4 | LD5 | LD6 | LD7 | LD8 | LD9 | LD10 | LD11 |
---|---|---|---|---|---|---|---|---|---|---|

2946812 | -670612.3 | -51176.50 | -60152.16 | -15984.96 | 7713.94 | -6665.36 | -506.18 | -191.72 | 13.34 | 119.22 |

2945011 | -673635.3 | -51040.25 | -60510.92 | -15825.40 | 7710.27 | -6680.04 | -549.81 | -186.92 | 14.73 | 120.22 |

2929874 | -702476.1 | -49449.80 | -62819.28 | -14508.37 | 7679.98 | -6379.68 | -540.73 | -202.59 | 18.59 | 113.91 |

2912656 | -734933.0 | -47688.44 | -65273.19 | -13064.15 | 7653.68 | -5931.05 | -391.56 | -227.23 | 20.65 | 106.02 |

2922599 | -716071.8 | -48739.24 | -63797.73 | -13926.80 | 7669.42 | -6144.48 | -416.29 | -215.81 | 18.88 | 110.20 |

2935204 | -692219.9 | -50035.27 | -61959.33 | -14991.48 | 7693.40 | -6443.59 | -489.49 | -199.75 | 16.34 | 115.85 |

…or the \(p \times (K-1)\) matrix of correlations between scores and (latent) variables

`$corr_map %>% head %>% knitr::kable(digits = 2) myLDA_nocov`

LD1 | LD2 | LD3 | LD4 | LD5 | LD6 | LD7 | LD8 | LD9 | LD10 | LD11 | |
---|---|---|---|---|---|---|---|---|---|---|---|

Che | -0.24 | 0.29 | 0.57 | 0.20 | 0.54 | -0.51 | 0.24 | -0.50 | -0.83 | -0.47 | -0.04 |

Hyc | -0.22 | -0.33 | 0.03 | 0.39 | -0.07 | -0.03 | 0.67 | 0.89 | 0.15 | -0.17 | -0.59 |

Hym | 0.03 | -0.15 | -0.26 | -0.78 | -0.36 | -0.31 | -0.18 | -0.04 | 0.29 | 0.01 | 0.61 |

Hys | -0.53 | 0.30 | -0.53 | -0.07 | -0.35 | -0.61 | 0.03 | -0.06 | 0.30 | -0.04 | -0.26 |

Psy | 0.37 | 0.11 | 0.23 | -0.27 | 0.46 | -0.03 | -0.15 | -0.07 | -0.53 | -0.60 | 0.35 |

Aga | 0.07 | -0.14 | -0.25 | -0.88 | -0.08 | -0.24 | 0.02 | -0.21 | 0.09 | -0.21 | 0.51 |

`predict`

methodThe `predict`

method has a slightly different behavior
than its siblings in other models of the **PLNmodels**. The
goal of `predict`

is to predict the discrete class based on
observed *species counts* (rather than predicting counts from
known covariates).

By default, the `predict`

use the argument
`type = "posterior"`

to output the matrix of log-posterior
probabilities \(\log(\hat{\pi})_k\)

```
<- predict(myLDA_nocov, newdata = trichoptera)
predicted.class ## equivalent to
## predicted.class <- predict(myLDA_nocov, newdata = trichoptera, type = "posterior")
%>% head() %>% knitr::kable(digits = 2) predicted.class
```

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|

-2063.01 | -2092.44 | -2086.28 | -2071.92 | -2100.22 | -2087.95 | -2072.42 | -2124.14 | -2121.96 | -2066.08 | -2232.45 | -2121.11 |

-2045.07 | -2053.95 | -2090.27 | -2057.33 | -2103.04 | -2065.97 | -2050.23 | -2050.77 | -2051.02 | -2050.42 | -2145.18 | -2098.36 |

-2064.00 | -2074.45 | -2159.42 | -2095.52 | -2159.70 | -2084.56 | -2077.60 | -2103.97 | -2095.02 | -2075.87 | -2188.35 | -2173.58 |

-2173.87 | -2182.09 | -2298.61 | -2323.68 | -2289.08 | -2206.30 | -2239.92 | -2318.51 | -2264.82 | -2213.48 | -2472.03 | -2473.18 |

-2090.25 | -2096.42 | -2134.46 | -2150.01 | -2145.35 | -2118.17 | -2115.65 | -2140.15 | -2122.77 | -2106.06 | -2262.43 | -2217.30 |

-2046.83 | -2051.78 | -2072.99 | -2063.05 | -2082.89 | -2063.34 | -2053.22 | -2054.46 | -2053.39 | -2052.38 | -2129.17 | -2103.63 |

You can also show them in the standard (and human-friendly) \([0, 1]\) scale with
`scale = "prob"`

to get the matrix \(\hat{\pi}_k\)

```
<- predict(myLDA_nocov, newdata = trichoptera, scale = "prob")
predicted.class %>% head() %>% knitr::kable(digits = 3) predicted.class
```

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|

0.955 | 0.000 | 0 | 0 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.044 | 0 | 0 |

0.984 | 0.000 | 0 | 0 | 0 | 0 | 0.006 | 0.003 | 0.003 | 0.005 | 0 | 0 |

1.000 | 0.000 | 0 | 0 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0 | 0 |

1.000 | 0.000 | 0 | 0 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0 | 0 |

0.998 | 0.002 | 0 | 0 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0 | 0 |

0.986 | 0.007 | 0 | 0 | 0 | 0 | 0.002 | 0.000 | 0.001 | 0.004 | 0 | 0 |

Setting `type = "response"`

, we can predict the most
likely group \(\hat{k}\) instead:

```
<- predict(myLDA_nocov, newdata = trichoptera, type = "response")
predicted.class predicted.class
```

```
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 1 1 1 1 1 1 9 2 1 1 1 1 2 3 2 2 2 3 3 3 9 3 4 1 4 4
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
## 5 5 4 5 6 7 7 7 8 8 8 8 8 1 9 9 9 10 10 10 10 11 12
## Levels: 1 2 3 4 5 6 7 8 9 10 11 12
```

We can assess that the predictions are quite similar to the real
group (*this is not a proper validation of the method as we used data
set for both model fitting and prediction and are thus at risk of
overfitting*).

`table(predicted.class, trichoptera$Group, dnn = c("predicted", "true"))`

```
## true
## predicted 1 2 3 4 5 6 7 8 9 10 11 12
## 1 10 0 0 1 0 0 0 0 1 0 0 0
## 2 1 4 0 0 0 0 0 0 0 0 0 0
## 3 0 1 4 0 0 0 0 0 0 0 0 0
## 4 0 0 0 3 1 0 0 0 0 0 0 0
## 5 0 0 0 0 3 0 0 0 0 0 0 0
## 6 0 0 0 0 0 1 0 0 0 0 0 0
## 7 0 0 0 0 0 0 3 0 0 0 0 0
## 8 0 0 0 0 0 0 0 4 1 0 0 0
## 9 1 0 1 0 0 0 0 0 3 0 0 0
## 10 0 0 0 0 0 0 0 0 0 4 0 0
## 11 0 0 0 0 0 0 0 0 0 0 1 0
## 12 0 0 0 0 0 0 0 0 0 0 0 1
```

Finally, we can get the coordinates of the new data on the same graph
at the original ones with `type = "scores"`

. This is done by
averaging the latent positions \(\hat{\mathbf{Z}}_i + \boldsymbol{\mu}_k\)
(found when the sample is assumed to come from group \(k\)) and weighting them with the \(\hat{\pi}_k\). Some samples, have
compositions that put them very far from their group mean.

```
library(ggplot2)
<- predict(myLDA_nocov, newdata = trichoptera, type = "scores")
predicted.scores colnames(predicted.scores) <- paste0("Axis.", 1:ncol(predicted.scores))
<- as.data.frame(predicted.scores)
predicted.scores $group <- trichoptera$Group
predicted.scoresplot(myLDA_nocov, map = "individual", nb_axes = 2, plot = FALSE) +
geom_point(data = predicted.scores,
aes(x = Axis.1, y = Axis.2, color = group, label = NULL))
```

It is possible to correct for other covariates before finding the LDA
axes that best separate well the groups. In our case ,we’re going to use
`Wind`

as a covariate and illustrate the main differences
with before :

```
<- PLNLDA(Abundance ~ Wind + 0 + offset(log(Offset)),
myLDA_cov grouping = Group,
data = trichoptera)
```

```
##
## Performing discriminant Analysis...
## DONE!
```

All fields of our new `PLNLDA`

fit can be accessed as
before with similar results. The only important difference is the result
of `coef`

: since we included a covariate in the model,
`coef`

now returns a 1-column matrix for \(\hat{\mathbf{B}}\) instead of
`NULL`

`coef(myLDA_cov) %>% head %>% knitr::kable()`

Wind | |
---|---|

Che | -0.3293527 |

Hyc | 1.4101303 |

Hym | -0.1926967 |

Hys | -0.4545513 |

Psy | 0.0365381 |

Aga | -0.0611851 |

The group-specific main effects can still be accessed with
`$group_means`

`$group_means %>% head %>% knitr::kable(digits = 2) myLDA_cov`

grouping1 | grouping2 | grouping3 | grouping4 | grouping5 | grouping6 | grouping7 | grouping8 | grouping9 | grouping10 | grouping11 | grouping12 |
---|---|---|---|---|---|---|---|---|---|---|---|

-18.64 | -10.45 | -22.22 | -17.46 | -23.63 | -40.18 | -27.45 | -25.91 | -17.80 | 3.66 | 10.37 | -21.85 |

-22.17 | -40.23 | -1.82 | -14.96 | -10.80 | -44.62 | -30.82 | -27.71 | -13.68 | -16.11 | -5.40 | 1.48 |

-1.12 | -4.45 | -1.73 | -1.73 | -5.78 | -6.93 | -2.66 | -1.83 | -1.51 | -1.50 | -23.28 | -3.28 |

-22.59 | -30.37 | -3.02 | 0.02 | -25.09 | -35.24 | -27.67 | -5.13 | -0.70 | -0.10 | -17.63 | -23.80 |

-0.05 | -0.42 | -0.47 | -0.85 | -0.44 | -0.43 | -0.59 | -0.71 | -0.07 | -0.15 | -0.19 | -1.03 |

-1.80 | -4.11 | -1.71 | -2.39 | -5.48 | -10.55 | -2.71 | -2.74 | -3.28 | -1.73 | -21.68 | -25.01 |

`plot`

methodOnce again, the `plot`

method is very useful to get a
quick look at the results.

`plot(myLDA_cov)`

`predict`

methodWe can again predict the most likely group for each sample :

`<- predict(myLDA_cov, newdata = trichoptera, type = "response") predicted.class_cov `

and check that we recover the correct class in most cases (again, we used the same data set for model fitting and group prediction only for ease of exposition):

`table(predicted.class_cov, trichoptera$Group, dnn = c("predicted", "true"))`

```
## true
## predicted 1 2 3 4 5 6 7 8 9 10 11 12
## 1 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 2 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 1 0 0 0 0 0 0 0
## 6 0 0 0 0 0 1 0 0 0 0 0 0
## 7 0 0 0 0 0 0 1 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0 0 0
## 9 0 0 0 0 0 0 0 0 0 0 0 0
## 10 0 0 0 0 0 0 0 0 0 0 0 0
## 11 0 0 0 0 0 0 0 0 0 0 0 0
## 12 12 3 5 4 3 0 2 4 5 4 1 1
```

Aitchison, J., and C. H. Ho. 1989. “The Multivariate Poisson-Log
Normal Distribution.” *Biometrika* 76 (4): 643–53.

Fisher, R. A. 1936. “The Use of Multiple Measurements in Taxonomic
Problems.” *Annals of Eugenics* 7 (2): 179–88.

Johnson, Steven G. 2011. *The NLopt Nonlinear-Optimization
Package*. https://nlopt.readthedocs.io/en/latest/.

Rao, C. Radhakrishna. 1948. “The Utilization of Multiple
Measurements in Problems of Biological Classification.”
*Journal of the Royal Statistical Society. Series B
(Methodological)* 10 (2): 159–203.

Svanberg, Krister. 2002. “A Class of Globally Convergent
Optimization Methods Based on Conservative Convex Separable
Approximations.” *SIAM Journal on Optimization* 12 (2):
555–73.