Introduction

SNMoE (Skew-Normal Mixtures-of-Experts) provides a flexible modelling framework for heterogenous data with possibly skewed distributions to generalize the standard Normal mixture of expert model. SNMoE consists of a mixture of K skew-Normal expert regressors network (of degree p) gated by a softmax gating network (of degree q) and is represented by:

• The gating network parameters alpha’s of the softmax net.
• The experts network parameters: The location parameters (regression coefficients) beta’s, scale parameters sigma’s, and the skewness parameters lambda’s. SNMoE thus generalises mixtures of (normal, skew-normal) distributions and mixtures of regressions with these distributions. For example, when $$q=0$$, we retrieve mixtures of (skew-normal, or normal) regressions, and when both $$p=0$$ and $$q=0$$, it is a mixture of (skew-normal, or normal) distributions. It also reduces to the standard (normal, skew-normal) distribution when we only use a single expert ($$K=1$$).

Model estimation/learning is performed by a dedicated expectation conditional maximization (ECM) algorithm by maximizing the observed data log-likelihood. We provide simulated examples to illustrate the use of the model in model-based clustering of heterogeneous regression data and in fitting non-linear regression functions.

It was written in R Markdown, using the knitr package for production.

See help(package="meteorits") for further details and references provided by citation("meteorits").

Application to a simulated dataset

Generate sample

n <- 500 # Size of the sample
alphak <- matrix(c(0, 8), ncol = 1) # Parameters of the gating network
betak <- matrix(c(0, -2.5, 0, 2.5), ncol = 2) # Regression coefficients of the experts
lambdak <- c(3, 5) # Skewness parameters of the experts
sigmak <- c(1, 1) # Standard deviations of the experts
x <- seq.int(from = -1, to = 1, length.out = n) # Inputs (predictors)

# Generate sample of size n
sample <- sampleUnivSNMoE(alphak = alphak, betak = betak, sigmak = sigmak,
lambdak = lambdak, x = x)
y <- sample$y Set up SNMoE model parameters K <- 2 # Number of regressors/experts p <- 1 # Order of the polynomial regression (regressors/experts) q <- 1 # Order of the logistic regression (gating network) Set up EM parameters n_tries <- 1 max_iter <- 1500 threshold <- 1e-6 verbose <- TRUE verbose_IRLS <- FALSE Estimation snmoe <- emSNMoE(X = x, Y = y, K, p, q, n_tries, max_iter, threshold, verbose, verbose_IRLS) ## EM - SNMoE: Iteration: 1 | log-likelihood: -527.287937164066 ## EM - SNMoE: Iteration: 2 | log-likelihood: -488.149669819772 ## EM - SNMoE: Iteration: 3 | log-likelihood: -486.613979894615 ## EM - SNMoE: Iteration: 4 | log-likelihood: -486.302628698495 ## EM - SNMoE: Iteration: 5 | log-likelihood: -486.222460715282 ## EM - SNMoE: Iteration: 6 | log-likelihood: -486.184660195025 ## EM - SNMoE: Iteration: 7 | log-likelihood: -486.153034476555 ## EM - SNMoE: Iteration: 8 | log-likelihood: -486.122006681072 ## EM - SNMoE: Iteration: 9 | log-likelihood: -486.091566542363 ## EM - SNMoE: Iteration: 10 | log-likelihood: -486.062270874981 ## EM - SNMoE: Iteration: 11 | log-likelihood: -486.034460049825 ## EM - SNMoE: Iteration: 12 | log-likelihood: -486.008263297245 ## EM - SNMoE: Iteration: 13 | log-likelihood: -485.983682395756 ## EM - SNMoE: Iteration: 14 | log-likelihood: -485.960662236847 ## EM - SNMoE: Iteration: 15 | log-likelihood: -485.93911281283 ## EM - SNMoE: Iteration: 16 | log-likelihood: -485.91889800827 ## EM - SNMoE: Iteration: 17 | log-likelihood: -485.899948784636 ## EM - SNMoE: Iteration: 18 | log-likelihood: -485.882220335441 ## EM - SNMoE: Iteration: 19 | log-likelihood: -485.865585665219 ## EM - SNMoE: Iteration: 20 | log-likelihood: -485.849975250496 ## EM - SNMoE: Iteration: 21 | log-likelihood: -485.835317464871 ## EM - SNMoE: Iteration: 22 | log-likelihood: -485.8215462664 ## EM - SNMoE: Iteration: 23 | log-likelihood: -485.808607071084 ## EM - SNMoE: Iteration: 24 | log-likelihood: -485.796444399875 ## EM - SNMoE: Iteration: 25 | log-likelihood: -485.784990363605 ## EM - SNMoE: Iteration: 26 | log-likelihood: -485.774197263514 ## EM - SNMoE: Iteration: 27 | log-likelihood: -485.764028131654 ## EM - SNMoE: Iteration: 28 | log-likelihood: -485.754440985716 ## EM - SNMoE: Iteration: 29 | log-likelihood: -485.745404877648 ## EM - SNMoE: Iteration: 30 | log-likelihood: -485.736886260643 ## EM - SNMoE: Iteration: 31 | log-likelihood: -485.728830856893 ## EM - SNMoE: Iteration: 32 | log-likelihood: -485.721230890484 ## EM - SNMoE: Iteration: 33 | log-likelihood: -485.714036912717 ## EM - SNMoE: Iteration: 34 | log-likelihood: -485.707220850139 ## EM - SNMoE: Iteration: 35 | log-likelihood: -485.700770898581 ## EM - SNMoE: Iteration: 36 | log-likelihood: -485.694657650289 ## EM - SNMoE: Iteration: 37 | log-likelihood: -485.688853535926 ## EM - SNMoE: Iteration: 38 | log-likelihood: -485.683371909014 ## EM - SNMoE: Iteration: 39 | log-likelihood: -485.678178306597 ## EM - SNMoE: Iteration: 40 | log-likelihood: -485.673241061917 ## EM - SNMoE: Iteration: 41 | log-likelihood: -485.668553347505 ## EM - SNMoE: Iteration: 42 | log-likelihood: -485.664108229458 ## EM - SNMoE: Iteration: 43 | log-likelihood: -485.659891312708 ## EM - SNMoE: Iteration: 44 | log-likelihood: -485.65587084941 ## EM - SNMoE: Iteration: 45 | log-likelihood: -485.652051592504 ## EM - SNMoE: Iteration: 46 | log-likelihood: -485.648423458796 ## EM - SNMoE: Iteration: 47 | log-likelihood: -485.644956903056 ## EM - SNMoE: Iteration: 48 | log-likelihood: -485.641651379967 ## EM - SNMoE: Iteration: 49 | log-likelihood: -485.638504265308 ## EM - SNMoE: Iteration: 50 | log-likelihood: -485.63550427347 ## EM - SNMoE: Iteration: 51 | log-likelihood: -485.632648684527 ## EM - SNMoE: Iteration: 52 | log-likelihood: -485.629926044387 ## EM - SNMoE: Iteration: 53 | log-likelihood: -485.627320251661 ## EM - SNMoE: Iteration: 54 | log-likelihood: -485.624829419361 ## EM - SNMoE: Iteration: 55 | log-likelihood: -485.622453305036 ## EM - SNMoE: Iteration: 56 | log-likelihood: -485.620178199553 ## EM - SNMoE: Iteration: 57 | log-likelihood: -485.617996552235 ## EM - SNMoE: Iteration: 58 | log-likelihood: -485.615918885241 ## EM - SNMoE: Iteration: 59 | log-likelihood: -485.61393912745 ## EM - SNMoE: Iteration: 60 | log-likelihood: -485.61203778135 ## EM - SNMoE: Iteration: 61 | log-likelihood: -485.610218075827 ## EM - SNMoE: Iteration: 62 | log-likelihood: -485.608475863347 ## EM - SNMoE: Iteration: 63 | log-likelihood: -485.606800073052 ## EM - SNMoE: Iteration: 64 | log-likelihood: -485.605189380751 ## EM - SNMoE: Iteration: 65 | log-likelihood: -485.603648407257 ## EM - SNMoE: Iteration: 66 | log-likelihood: -485.60217125484 ## EM - SNMoE: Iteration: 67 | log-likelihood: -485.600766619527 ## EM - SNMoE: Iteration: 68 | log-likelihood: -485.599407085375 ## EM - SNMoE: Iteration: 69 | log-likelihood: -485.59809908388 ## EM - SNMoE: Iteration: 70 | log-likelihood: -485.59684184304 ## EM - SNMoE: Iteration: 71 | log-likelihood: -485.595629799638 ## EM - SNMoE: Iteration: 72 | log-likelihood: -485.59447564897 ## EM - SNMoE: Iteration: 73 | log-likelihood: -485.593371612486 ## EM - SNMoE: Iteration: 74 | log-likelihood: -485.592313444969 ## EM - SNMoE: Iteration: 75 | log-likelihood: -485.591295083416 ## EM - SNMoE: Iteration: 76 | log-likelihood: -485.590316544476 ## EM - SNMoE: Iteration: 77 | log-likelihood: -485.5893686805 ## EM - SNMoE: Iteration: 78 | log-likelihood: -485.588445462352 ## EM - SNMoE: Iteration: 79 | log-likelihood: -485.587558943622 ## EM - SNMoE: Iteration: 80 | log-likelihood: -485.586704633952 ## EM - SNMoE: Iteration: 81 | log-likelihood: -485.585878110093 ## EM - SNMoE: Iteration: 82 | log-likelihood: -485.585078538216 ## EM - SNMoE: Iteration: 83 | log-likelihood: -485.584310754457 ## EM - SNMoE: Iteration: 84 | log-likelihood: -485.583572491005 ## EM - SNMoE: Iteration: 85 | log-likelihood: -485.582860765507 ## EM - SNMoE: Iteration: 86 | log-likelihood: -485.58217443264 ## EM - SNMoE: Iteration: 87 | log-likelihood: -485.581510965869 ## EM - SNMoE: Iteration: 88 | log-likelihood: -485.580867196463 ## EM - SNMoE: Iteration: 89 | log-likelihood: -485.580242663066 ## EM - SNMoE: Iteration: 90 | log-likelihood: -485.579645636856 ## EM - SNMoE: Iteration: 91 | log-likelihood: -485.579071362399 ## EM - SNMoE: Iteration: 92 | log-likelihood: -485.578512662018 ## EM - SNMoE: Iteration: 93 | log-likelihood: -485.577973190244 ## EM - SNMoE: Iteration: 94 | log-likelihood: -485.577452194271 ## EM - SNMoE: Iteration: 95 | log-likelihood: -485.576948142351 ## EM - SNMoE: Iteration: 96 | log-likelihood: -485.576456396579 ## EM - SNMoE: Iteration: 97 | log-likelihood: -485.575974064756 Summary snmoe$summary()
## -----------------------------------------------
## Fitted Skew-Normal Mixture-of-Experts model
## -----------------------------------------------
##
## SNMoE model with K = 2 experts:
##
##  log-likelihood df      AIC      BIC       ICL
##        -485.576 10 -495.576 -516.649 -516.6574
##
## Clustering table (Number of observations in each expert):
##
##   1   2
## 249 251
##
## Regression coefficients:
##
##     Beta(k = 1) Beta(k = 2)
## 1      1.051904    1.013374
## X^1    3.004689   -2.778066
##
## Variances:
##
##  Sigma2(k = 1) Sigma2(k = 2)
##      0.3738266     0.4534028

Application to a real dataset

data("tempanomalies")
x <- tempanomalies$Year y <- tempanomalies$AnnualAnomaly

Set up SNMoE model parameters

K <- 2 # Number of regressors/experts
p <- 1 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-6
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

snmoe <- emSNMoE(X = x, Y = y, K, p, q, n_tries, max_iter,
threshold, verbose, verbose_IRLS)
## EM - SNMoE: Iteration: 1 | log-likelihood: 67.1393912546267
## EM - SNMoE: Iteration: 2 | log-likelihood: 86.3123763058244
## EM - SNMoE: Iteration: 3 | log-likelihood: 88.4049020398015
## EM - SNMoE: Iteration: 4 | log-likelihood: 88.7786025096324
## EM - SNMoE: Iteration: 5 | log-likelihood: 88.9863371759242
## EM - SNMoE: Iteration: 6 | log-likelihood: 89.2159102763086
## EM - SNMoE: Iteration: 7 | log-likelihood: 89.4166837570103
## EM - SNMoE: Iteration: 8 | log-likelihood: 89.5378228423525
## EM - SNMoE: Iteration: 9 | log-likelihood: 89.6078941897507
## EM - SNMoE: Iteration: 10 | log-likelihood: 89.6506081922485
## EM - SNMoE: Iteration: 11 | log-likelihood: 89.680679493927
## EM - SNMoE: Iteration: 12 | log-likelihood: 89.7054127986757
## EM - SNMoE: Iteration: 13 | log-likelihood: 89.7271627052861
## EM - SNMoE: Iteration: 14 | log-likelihood: 89.7466422435391
## EM - SNMoE: Iteration: 15 | log-likelihood: 89.7644359313908
## EM - SNMoE: Iteration: 16 | log-likelihood: 89.7808442763708
## EM - SNMoE: Iteration: 17 | log-likelihood: 89.7959623872005
## EM - SNMoE: Iteration: 18 | log-likelihood: 89.8098298887156
## EM - SNMoE: Iteration: 19 | log-likelihood: 89.8224765128155
## EM - SNMoE: Iteration: 20 | log-likelihood: 89.8339351359208
## EM - SNMoE: Iteration: 21 | log-likelihood: 89.8444584666489
## EM - SNMoE: Iteration: 22 | log-likelihood: 89.8539391972029
## EM - SNMoE: Iteration: 23 | log-likelihood: 89.8623392185522
## EM - SNMoE: Iteration: 24 | log-likelihood: 89.8697291463709
## EM - SNMoE: Iteration: 25 | log-likelihood: 89.8763827151644
## EM - SNMoE: Iteration: 26 | log-likelihood: 89.8811754383375
## EM - SNMoE: Iteration: 27 | log-likelihood: 89.8860645132145
## EM - SNMoE: Iteration: 28 | log-likelihood: 89.8901911599733
## EM - SNMoE: Iteration: 29 | log-likelihood: 89.8939229923584
## EM - SNMoE: Iteration: 30 | log-likelihood: 89.897264155598
## EM - SNMoE: Iteration: 31 | log-likelihood: 89.9007321568667
## EM - SNMoE: Iteration: 32 | log-likelihood: 89.9035508488742
## EM - SNMoE: Iteration: 33 | log-likelihood: 89.9060694862566
## EM - SNMoE: Iteration: 34 | log-likelihood: 89.9086672705961
## EM - SNMoE: Iteration: 35 | log-likelihood: 89.9109149161921
## EM - SNMoE: Iteration: 36 | log-likelihood: 89.9130049122629
## EM - SNMoE: Iteration: 37 | log-likelihood: 89.9151466747962
## EM - SNMoE: Iteration: 38 | log-likelihood: 89.9170490540402
## EM - SNMoE: Iteration: 39 | log-likelihood: 89.9189455614356
## EM - SNMoE: Iteration: 40 | log-likelihood: 89.920722490437
## EM - SNMoE: Iteration: 41 | log-likelihood: 89.9223861223175
## EM - SNMoE: Iteration: 42 | log-likelihood: 89.9240011170035
## EM - SNMoE: Iteration: 43 | log-likelihood: 89.9255444752544
## EM - SNMoE: Iteration: 44 | log-likelihood: 89.9270147197148
## EM - SNMoE: Iteration: 45 | log-likelihood: 89.9284205205757
## EM - SNMoE: Iteration: 46 | log-likelihood: 89.929768350036
## EM - SNMoE: Iteration: 47 | log-likelihood: 89.9310655713287
## EM - SNMoE: Iteration: 48 | log-likelihood: 89.9323114372458
## EM - SNMoE: Iteration: 49 | log-likelihood: 89.9335083111587
## EM - SNMoE: Iteration: 50 | log-likelihood: 89.9346590487228
## EM - SNMoE: Iteration: 51 | log-likelihood: 89.9357648946395
## EM - SNMoE: Iteration: 52 | log-likelihood: 89.9368284790995
## EM - SNMoE: Iteration: 53 | log-likelihood: 89.9378517785344
## EM - SNMoE: Iteration: 54 | log-likelihood: 89.9388344884152
## EM - SNMoE: Iteration: 55 | log-likelihood: 89.9397794710125
## EM - SNMoE: Iteration: 56 | log-likelihood: 89.9406929038835
## EM - SNMoE: Iteration: 57 | log-likelihood: 89.9415721977169
## EM - SNMoE: Iteration: 58 | log-likelihood: 89.9424179529526
## EM - SNMoE: Iteration: 59 | log-likelihood: 89.9432317703868
## EM - SNMoE: Iteration: 60 | log-likelihood: 89.9440151036607
## EM - SNMoE: Iteration: 61 | log-likelihood: 89.9447720669891
## EM - SNMoE: Iteration: 62 | log-likelihood: 89.9455021664009
## EM - SNMoE: Iteration: 63 | log-likelihood: 89.9462065398637
## EM - SNMoE: Iteration: 64 | log-likelihood: 89.9468856981156
## EM - SNMoE: Iteration: 65 | log-likelihood: 89.9475410134714
## EM - SNMoE: Iteration: 66 | log-likelihood: 89.9481732090574
## EM - SNMoE: Iteration: 67 | log-likelihood: 89.9487828085701
## EM - SNMoE: Iteration: 68 | log-likelihood: 89.9493709174674
## EM - SNMoE: Iteration: 69 | log-likelihood: 89.9499393216653
## EM - SNMoE: Iteration: 70 | log-likelihood: 89.9504915641522
## EM - SNMoE: Iteration: 71 | log-likelihood: 89.9510234324277
## EM - SNMoE: Iteration: 72 | log-likelihood: 89.9515375509019
## EM - SNMoE: Iteration: 73 | log-likelihood: 89.9520343897918
## EM - SNMoE: Iteration: 74 | log-likelihood: 89.9525147730548
## EM - SNMoE: Iteration: 75 | log-likelihood: 89.952979526795
## EM - SNMoE: Iteration: 76 | log-likelihood: 89.9534287405897
## EM - SNMoE: Iteration: 77 | log-likelihood: 89.9538633332105
## EM - SNMoE: Iteration: 78 | log-likelihood: 89.9542840954176
## EM - SNMoE: Iteration: 79 | log-likelihood: 89.9546914335969
## EM - SNMoE: Iteration: 80 | log-likelihood: 89.9550861492999
## EM - SNMoE: Iteration: 81 | log-likelihood: 89.9554686454909
## EM - SNMoE: Iteration: 82 | log-likelihood: 89.9558386903462
## EM - SNMoE: Iteration: 83 | log-likelihood: 89.9561975428098
## EM - SNMoE: Iteration: 84 | log-likelihood: 89.956545549163
## EM - SNMoE: Iteration: 85 | log-likelihood: 89.9568826067365
## EM - SNMoE: Iteration: 86 | log-likelihood: 89.9572095986266
## EM - SNMoE: Iteration: 87 | log-likelihood: 89.9575263695436
## EM - SNMoE: Iteration: 88 | log-likelihood: 89.9578328566839
## EM - SNMoE: Iteration: 89 | log-likelihood: 89.9581293780223
## EM - SNMoE: Iteration: 90 | log-likelihood: 89.9584173442332
## EM - SNMoE: Iteration: 91 | log-likelihood: 89.958697543531
## EM - SNMoE: Iteration: 92 | log-likelihood: 89.95897017134
## EM - SNMoE: Iteration: 93 | log-likelihood: 89.9592343217354
## EM - SNMoE: Iteration: 94 | log-likelihood: 89.959490268592
## EM - SNMoE: Iteration: 95 | log-likelihood: 89.9597407658552
## EM - SNMoE: Iteration: 96 | log-likelihood: 89.9599830242252
## EM - SNMoE: Iteration: 97 | log-likelihood: 89.960219158931
## EM - SNMoE: Iteration: 98 | log-likelihood: 89.9604487697759
## EM - SNMoE: Iteration: 99 | log-likelihood: 89.9606701685812
## EM - SNMoE: Iteration: 100 | log-likelihood: 89.9608852187594
## EM - SNMoE: Iteration: 101 | log-likelihood: 89.9610939894636
## EM - SNMoE: Iteration: 102 | log-likelihood: 89.9612985304711
## EM - SNMoE: Iteration: 103 | log-likelihood: 89.961496994385
## EM - SNMoE: Iteration: 104 | log-likelihood: 89.9616903747286
## EM - SNMoE: Iteration: 105 | log-likelihood: 89.9618790690262
## EM - SNMoE: Iteration: 106 | log-likelihood: 89.9620614678624
## EM - SNMoE: Iteration: 107 | log-likelihood: 89.9622377985414
## EM - SNMoE: Iteration: 108 | log-likelihood: 89.9624112482239
## EM - SNMoE: Iteration: 109 | log-likelihood: 89.9625810627667
## EM - SNMoE: Iteration: 110 | log-likelihood: 89.9627449576569
## EM - SNMoE: Iteration: 111 | log-likelihood: 89.9629049110195
## EM - SNMoE: Iteration: 112 | log-likelihood: 89.9630633947957
## EM - SNMoE: Iteration: 113 | log-likelihood: 89.9632165833158
## EM - SNMoE: Iteration: 114 | log-likelihood: 89.9633637034398
## EM - SNMoE: Iteration: 115 | log-likelihood: 89.9635083452088
## EM - SNMoE: Iteration: 116 | log-likelihood: 89.9636499016958
## EM - SNMoE: Iteration: 117 | log-likelihood: 89.9637870583276
## EM - SNMoE: Iteration: 118 | log-likelihood: 89.9639202934018
## EM - SNMoE: Iteration: 119 | log-likelihood: 89.9640519846681
## EM - SNMoE: Iteration: 120 | log-likelihood: 89.964180667269
## EM - SNMoE: Iteration: 121 | log-likelihood: 89.9643046747079
## EM - SNMoE: Iteration: 122 | log-likelihood: 89.9644253123161
## EM - SNMoE: Iteration: 123 | log-likelihood: 89.9645423331732
## EM - SNMoE: Iteration: 124 | log-likelihood: 89.9646558210273
## EM - SNMoE: Iteration: 125 | log-likelihood: 89.9647663127239
## EM - SNMoE: Iteration: 126 | log-likelihood: 89.9648744243076
## EM - SNMoE: Iteration: 127 | log-likelihood: 89.9649800581561
## EM - SNMoE: Iteration: 128 | log-likelihood: 89.9650828879559
## EM - SNMoE: Iteration: 129 | log-likelihood: 89.9651846451398
## EM - SNMoE: Iteration: 130 | log-likelihood: 89.9652861758818
## EM - SNMoE: Iteration: 131 | log-likelihood: 89.9653850801511
## EM - SNMoE: Iteration: 132 | log-likelihood: 89.9654812002778
## EM - SNMoE: Iteration: 133 | log-likelihood: 89.9655748811957
## EM - SNMoE: Iteration: 134 | log-likelihood: 89.9656663487702
## EM - SNMoE: Iteration: 135 | log-likelihood: 89.9657558236992

Summary

Log-likelihood

snmoe\$plot(what = "loglikelihood")