The `logitr()`

function estimates multinomial logit (MNL) and mixed logit (MXL) models using “Preference” space or “Willingness-to-pay (WTP)” space utility parameterizations. The basic usage for estimating models is as follows:

```
<- logitr(
model
data,
choiceName,
obsIDName,
parNames,priceName = NULL,
randPars = NULL,
randPrice = NULL,
modelSpace = "pref",
weightsName = NULL,
options = list()
)
```

The `data`

argument must be provided a `data.frame`

arranged in “long” or “tidy” format [@Wickham2014]. Each row should be an alternative from a choice observation. The choice observations do not have to be symmetric (i.e. each choice observation could have a different number of alternatives). The data must also include variables for each of the following arguments in the `logitr()`

function:

`choiceName`

: A dummy variable that identifies which alternative was chosen (`1`

is chosen,`0`

is not chosen). Only one alternative should be chosen per choice observation.`obsIDName`

: A sequence of repeated numbers that identifies each unique choice observation For example, if the first three choice observations had 2 alternatives each, then the first 6 rows of the`obsID`

variable would be`1, 1, 2, 2, 3, 3`

.`parNames`

: The names of the variables that will be used as model covariates. For WTP space models, the price variable should**not**be included in`parNames`

as it is provided separately with the`priceName`

argument.

The “Data Formatting and Encoding” vignette has more details about the required data format.

The `logitr()`

function assumes that the deterministic part of the utility function, \(v_{j}\), is linear in parameters, i.e.:

\[\begin{equation} u_{j} = v_{j} + \varepsilon_{j} \hspace{0.5in} \varepsilon_{j} \sim \textrm{Gumbel}\left(0,\frac{\pi^2}{6}\right) \label{eq:utilityComponents} \end{equation}\]

\[\begin{equation} v_{j} = \mathbf{β}' \mathrm{\mathbf{x}}_{j} + \alpha p_{j} \hspace{0.5in} \label{eq:utilityPreferenceObserved} \end{equation}\]

where \(\mathbf{β}\) is the vector of coefficients for non-price attributes \(\mathrm{\mathbf{x}}_{j}\), \(\alpha\) is the coefficient for price \(p_{j}\). Accordingly, each parameter in the `parNames`

argument is an additive part of \(v_{j}\).

For example, if the observed utility for yogurt purchases was \(v_{j} = \beta x_{j} + \alpha p_{j}\), where \(p_{j}\) is “price” and \(x_{j}\) is “brand”, then the `parNames`

argument should be `c("price", "brand")`

. This model can be estimated as follows using the `yogurt`

data frame:

```
<- logitr(
mnl_pref data = yogurt,
choiceName = "choice",
obsIDName = "obsID",
parNames = c("price", "brand"))
)
```

Use the `summary()`

function to view a summary of the estimated model results. In the following summary, three coefficients are shown for the “brand” attribute, with `"dannon"`

set as the reference level:

```
summary(mnl_pref)
#> =================================================
#> MODEL SUMMARY:
#>
#> Model Space: Preference
#> Model Run: 1 of 1
#> Iterations: 21
#> Elapsed Time: 0h:0m:0.13s
#> Exit Status: 3
#> Weights Used?: FALSE
#> robust? FALSE
#>
#> Model Coefficients:
#> Estimate StdError tStat pVal signif
#> price -0.366543 0.024365 -15.0439 0 ***
#> feat 0.491433 0.120061 4.0932 0 ***
#> brandhiland -3.715428 0.145416 -25.5504 0 ***
#> brandweight -0.641128 0.054498 -11.7643 0 ***
#> brandyoplait 0.734496 0.080642 9.1082 0 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Model Fit Values:
#>
#> Log-Likelihood: -2656.8878799
#> Null Log-Likelihood: -3343.7419990
#> AIC: 5323.7758000
#> BIC: 5352.7168000
#> McFadden R2: 0.2054148
#> Adj McFadden R2: 0.2039195
#> Number of Observations: 2412.0000000
```

For models estimated in the WTP space, the `modelSpace`

argument should be set to `"wtp"`

. In addition, the `parNames`

argument should *not* contain the name of the column for the price attribute because there is no estimated “price” parameter in WTP space models. Instead, a “scale” parameter given by \(\lambda\) will be estimated, and the column for “price” should be provided separately using the `priceName`

argument.

For example, if the observed utility was \(v_{j} = \lambda \left(\omega x_{j} - p_{j}\right)\), where \(p_{j}\) is “price” and \(x_{j}\) is “brand”, then the `parNames`

argument should be `"brand"`

and the `priceName`

argument should be `"price"`

. This model can be estimated as follows using the `yogurt`

data frame:

```
<- logitr(
mnl_wtp data = yogurt,
choiceName = "choice",
obsIDName = "obsID",
parNames = "brand",
priceName = "price",
modelSpace = "wtp",
options = list(numMultiStarts = 10)
)
```

Since WTP space models are non-convex, it is recommended to use a multi-start search to run the optimization loop multiple times to search for different local minima. This was implemented in the above example by passing a `numMultiStarts`

argument inside a list to the `options`

argument (other options can be passed in the same list).

In the WTP space model, the estimated “brand” coefficients have units of dollars and can be interpreted as the average willingness to pay for each brand relative to the “Dannon” brand, all else being equal.

To estimate a mixed logit model, use the `randPars`

argument in the `logitr()`

function to denote which parameters will be modeled with a distribution. The current package version supports normal (`"n"`

) and log-normal (`"ln"`

) distributions.

For example, assume the observed utility for yogurts was \(v_{j} = \beta x_{j} + \alpha p_{j}\), where \(p_{j}\) is “price” and \(x_{j}\) is “brand”. To model `"brand"`

parameter, \(\beta\), with a normal or log-normal distribution, set `randPars = c(size = "n")`

or `randPars = c(size = "ln")`

:

```
<- logitr(
mxl_pref data = yogurt,
choiceName = "choice",
obsIDName = "obsID",
parNames = c("price", "brand"),
randPars = c(brand = "n"),
options = list(numMultiStarts = 10)
)
```

Since mixed logit models are non-convex, it is again recommended to use a multi-start search to run the optimization loop multiple times to search for different local minima. Note that mixed logit models can take a long time to estimate, so setting large number for `numMultiStarts`

could take hours or longer to complete.

Use the `summary()`

function to print a summary of the results from an estimated model. The function will print the following based on the model settings:

- For a single model run, it prints some summary information, including the model space (Preference or WTP), log-likelihood value at the solution, and a summary table of the model coefficients.
- For MXL models, the function also prints a summary of the random parameters.
- If you set
`keepAllRuns = TRUE`

in the`options`

argument,`summary()`

will print a summary of all the multistart runs followed by a summary of the best model (as determined by the largest log-likelihood value).

Use `statusCodes()`

to print a description of each status code from the `nloptr`

optimization routine.

For models in the preference space, a summary table of the computed WTP based on the estimated preference space parameters can be obtained with the `wtp()`

function. For example, the computed WTP from the previously estimated fixed parameter model can be obtained with the following command:

```
wtp(mnl_pref, priceName = "price")
#> Estimate StdError tStat pVal signif
#> lambda 0.366543 0.024384 15.0320 0e+00 ***
#> feat 1.340724 0.359525 3.7292 2e-04 ***
#> brandhiland -10.136406 0.582781 -17.3932 0e+00 ***
#> brandweight -1.749121 0.182274 -9.5961 0e+00 ***
#> brandyoplait 2.003848 0.143285 13.9851 0e+00 ***
```

The `wtp()`

function divides the non-price parameters by the negative of the price parameter. Standard errors are estimated using the Krinsky and Robb parametric bootstrapping method [@Krinsky1986]. Similarly, the `wtpCompare()`

function can be used to compare the WTP from a WTP space model with that computed from an equivalent preference space model:

```
wtpCompare(mnl_pref, mnl_wtp, priceName = "price")
#> pref wtp difference
#> lambda 0.366543 0.3665854 0.00004238
#> feat 1.340724 1.3405708 -0.00015315
#> brandhiland -10.136406 -10.1357346 0.00067139
#> brandweight -1.749121 -1.7490629 0.00005807
#> brandyoplait 2.003848 2.0038178 -0.00003018
#> logLik -2656.887880 -2656.8878779 0.00000194
```

Estimated models can be used to predict expected choices and choice probabilities for a set (or multiple sets) of alternatives based on an estimated model. As an example, consider predicting choice probabilities for two of the choice observations from the `yogurt`

dataset:

```
<- subset(
alts %in% c(42, 13),
yogurt, obsID select = c('obsID', 'choice', 'price', 'feat', 'brand')
)
alts#> obsID choice price feat brand
#> 49 13 0 8.1 0 dannon
#> 50 13 0 5.0 0 hiland
#> 51 13 1 8.6 0 weight
#> 52 13 0 10.8 0 yoplait
#> 165 42 1 6.3 0 dannon
#> 166 42 0 6.1 1 hiland
#> 167 42 0 7.9 0 weight
#> 168 42 0 11.5 0 yoplait
```

In the example below, the expected choice probabilities for these two sets of alternatives are computed using the fixed parameter `mnl_pref`

model:

```
<- predictProbs(
probs model = mnl_pref,
alts = alts,
obsIDName = "obsID"
)
probs#> obsID prob_mean prob_low prob_high
#> 49 13 0.43684871 0.41507538 0.45803979
#> 50 13 0.03313009 0.02620996 0.04185603
#> 51 13 0.19155738 0.17623217 0.20774895
#> 52 13 0.33846382 0.31891140 0.35928491
#> 165 42 0.60763975 0.57309149 0.64106191
#> 166 42 0.02602079 0.01841343 0.03671153
#> 167 42 0.17803594 0.16200334 0.19478044
#> 168 42 0.18830353 0.16779635 0.20987379
```

The resulting `probs`

data frame contains the expected choice probabilities for each alternative. The low and high values show a 95% confidence interval, estimated using the Krinsky and Robb parametric bootstrapping method [@Krinsky1986]. You can change the CI level by setting alpha to a different value (e.g. a 90% CI is obtained with alpha = 0.05).

Choice probabilities can also be predicted used WTP space models. For example, the predicted choice probabilities using the `mnl_wtp`

model are nearly identical with those from the `mnl_pref`

model:

```
<- predictProbs(
probs model = mnl_wtp,
alts = alts,
obsIDName = "obsID"
)
probs#> obsID prob_mean prob_low prob_high
#> 49 13 0.43686166 0.41514663 0.45812349
#> 50 13 0.03312933 0.02651411 0.04212799
#> 51 13 0.19154885 0.17606783 0.20749348
#> 52 13 0.33846015 0.31828622 0.35897979
#> 165 42 0.60767236 0.57218829 0.64084438
#> 166 42 0.02601763 0.01836858 0.03677997
#> 167 42 0.17802398 0.16177482 0.19462268
#> 168 42 0.18828603 0.16728440 0.20955128
```

Similar to the `predictProbs()`

function, the `predictChoices()`

function can be used to predict choices using the results of an estimated model. In the example below, choices are predicted for the entire `yogurt`

dataset, which was used to the estimate the `mnl_pref`

model:

```
<- predictChoices(
choices model = mnl_pref,
alts = yogurt,
obsIDName = "obsID"
)
# Preview actual and predicted choices
head(choices[c('obsID', 'choice', 'choice_predict')])
#> obsID choice choice_predict
#> 1.1 1 0 0
#> 1.2 1 0 0
#> 1.3 1 1 1
#> 1.4 1 0 0
#> 2.5 2 1 0
#> 2.6 2 0 0
```

The resulting `choices`

data frame contains the same `alts`

data frame with an additional column, `choice_predict`

, which contains the predicted choices. You can quickly compute the accuracy by dividing the number of correctly predicted choices by the total number of choices:

```
<- subset(choices, choice == 1)
chosen $correct <- chosen$choice == chosen$choice_predict
chosensum(chosen$correct) / nrow(chosen)
#> [1] 0.3681592
```