Before starting, it should be mentioned that the sole purpose of this vignette is to provide intuitive and easily replicable instructions on how to use the FVDDPpkg package on . For this reason, the underlying theory will not be developed except in the minimum necessary terms; for more information, please refer to the bibliography cited here.

First of all, import the package.

As shown in the work of Papaspiliopoulos, Ruggiero, and Spanò (2016) Fleming-Viot Dependent Dirichlet Processes (FVDDP), conditioned on observed data \(Y\), take the general form of finite mixtures of Dirichlet Processes: in fact \[X_t \ | \ Y \sim \sum_{\mathbf{m} \in M} w_\mathbf{m}\Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}\] where

- \(X_t\) is the state of the
process, also denoted as
*latent signal*in the related literature. Its dependence on the time \(t\) is crucial, since the aim of this package is to perform the computations required to estimate its state at different times. - \(\mathbf{y}^\ast = (y_1^\ast, \dots, y_K^\ast)\) is a \(K\)-dimensional vector containing all observed unique values.
- \(M\) is a set of multiplicities; its elements are \(K\)-dimensional integer-valued vectors in the form \(\mathbf{m}= (m_1, \dots, m_K)\). Intuitively, they can be thought as their \(j\)-th entry counts the occurences of the \(j\)-th unique value, \(y_j^\ast\).
- \(w_\mathbf{m}\) is the weight associated to the vector \(\mathbf{m}\in M\). By definition, weights are positive and sum up to \(1\).
- \(\Pi_\alpha\) denotes the law of a Dirchet process, as introduced by Ferguson.
- \(\alpha\) is the characterizing measure of a Dirichlet Process. In turn, it can be expressed as \(\alpha = \theta P_0\), where the real number \(\theta\) is the intensity, and the probability measure \(P_0\) gives the centering. For a wide review of the Dirichlet Process and its properties, refer to Ghosal and Vaart (2017).
- \(\delta_y\) is a Dirac measure who puts mass on the point \(y\).

The derivation of this model, which stems from the study of population genetics, is done by exploiting the concept of duality for Markov processes (Papaspiliopoulos and Ruggiero (2014)), applying it to the results of Ethier and Griffiths (1993), among the others, on the seminal work of Fleming and Viot (1979).

In order to understand how to recover the previous expression, start
noting that, unconditional on observed data \[X_{t_0} \sim \Pi_{\alpha},\] where the
time is arbitrarily set \(t=t_0\). This
means that, while no data is included within the model, FVDDP can be
fully characterized by \(\theta\) and
\(P_0\). The creation of a process is
carried out using the function `initialize`

The user has to
specify the positive real number `theta`

and two functions,
the first to sample from \(P_0\)
(`sampling.f`

) and the second to evaluate its p.d.f. or
p.m.f. (`density.f`

), depending whether it is atomic or not;
this is specified by the last argument, `atomic`

. The
function returns an object of class `fvddp`

.

```
FVDDP = initialize(theta = 1.28, sampling.f = function(x) rpois(x, 5),
density.f = function(x) dpois(x, 5), TRUE)
```

```
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#>
#> is.atomic: TRUE
#>
#> Empty Process
```

In this chunk of code, for example, \(\theta = 1.28\) and \(P_0 \sim \mathrm{Po(5)}\). Note that when printing the process, it is explicitly stated that no data has been included within the model.

Updating the process with data collected at time \(t_0\) stored in the vector \(Y_0\), the form of the latent signal becomes \[ X_{t_0} \ | \ Y_0 \sim \Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}. \] In some sense, this already is the mixture expressed in the general formula, under the specification that the vector \(\mathbf{y}^\ast\) collects the unique values observed at time \(t_0\), \(\mathbf{m}\) stores the multiplicities of \(Y_0\) with respect to \(\mathbf{y}^\ast\), and \(M = \{ \mathbf{m}\}\): this implies that \(w_\mathbf{m}= 1\).

The update is performed by means of the `update()`

command, whose arguments are an `fvddp`

object and a numeric
vector. The returned object will include the information provided by
\(Y_0\). In particular:

- \(\mathbf{y}^\ast\) is stored as an
ordered vector in the attribute
`y.star`

. - \(M\) is stored as an
integer-valued matrix
`M`

of size \(|M| \times K\), where \(|M|\) denotes the cardinality of \(M\) and \(K\) is the length of \(\mathbf{y}^\ast\). Each vector \(\mathbf{m}\) is stored as a row of`M`

. - \(w\) is stored as a vector
`w`

of size \(|M|\). Its \(j\)-th element represents the weight associated to the \(j\)-th row of`M`

.

```
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 4 7 9
#>
#> Multiplicities (M):
#> 4 7 9
#> [1,] 1 2 1
#>
#> Weights (w):
#> [1] 1
```

As one could expect, updating the signal with the vector \((4,7,7,9)\) (the order of the input does not matter) leads to a degenerate mixture with \(\mathbf{y}^\ast = (4,7,9)\) and \(M = \{ (1,2,1)\}\).

Updating a non-empty process will have a different effect: suppose to know the law of \[X_{t_n} \ | \ Y_0, \dots, Y_{n-1} \sim \sum_{\mathbf{m} \in M} w_\mathbf{m}\Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}\] where \(Y_j\) denotes a vector of values observed at time \(t_j\), then \[X_{t_n} | Y_0 \dots, Y_n \sim \sum_{\mathbf{m} \in (M + \mathbf{n})} w_\mathbf{m}^\phi \Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}\] where \(\mathbf{n}= (n_1, \dots, n_K)\) is the vector of multiplicities of \(Y_n\) according to the unique values collected up to the same time, the new weights are such that \[w_{\mathbf{m}}^\phi \propto w_\mathbf{m}\mathrm{PU}(\mathbf{n}|\mathbf{m})\] where \(\mathrm{PU}(\mathbf{m}|\mathbf{n})\) denotes the probability of drawing a vector of multiplicities \(\mathbf{n}\) starting from \(\mathbf{m}\) via Polya urn sampling scheme under \(\theta\) and \(P_0\) specified by the model, and \[M + \mathbf{n}= \{ \mathbf{m}+ \mathbf{n}: \mathbf{m}\in M\}.\]. Hence, the following changes will be applied to the input process of the function:

- if new values are observed, they are included in
`y.star`

. - the vector \(\mathbf{n}\) will be
added to each row
`M`

. - the vector \(w\) ill be appropriately modified and normalized in order to sum up to one.

For the details of the role of Polya urn scheme on the update of mixtures of Dirichlet Processes, see Antoniak (1974) and Blackwell and MacQueen (1973).

The propagation of the signal, also known as prediction, aims to estimate the state of the process at a time after the data is collected: in other words, in the future. If updating the signal one can get \(X_{t_n} \ | \ Y_0, \dots, Y_n\), the use of the propagation leads to \[X_{t_n + t}\ |\ Y_0, \dots, Y_n \sim \sum_{\mathbf{n} \in L(M)} w_\mathbf{n}^\psi \Pi_{\alpha + \sum_{j=1}^K n_j \delta_{y_j^\ast}}.\] This means that the probability mass is shifted to a set \[L(M) = \{ \mathbf{n}\in M : \exists \ \mathbf{m}\in M : \mathbf{n}\leq \mathbf{m}\}\] where the notation \(\mathbf{n}\leq \mathbf{m}\) implies that \(n_j \leq m_j \ \forall j \in \{1, \dots, K\}\). The new weights are such that \[w_\mathbf{n}^\phi = \sum_{\mathbf{m}\in M : \mathbf{n}\leq \mathbf{m}} w_\mathbf{m}p_{\mathbf{m}, \mathbf{n}}(t)\] and \(p_{\mathbf{m}, \mathbf{n}}(t)\) represents the probability of reaching \(\mathbf{n}\) starting from \(\mathbf{m}\) in a time \(t\) for a \(K\)-dimensional death process, as shown in Tavaré (1984); however, the exact value of such probability, stated in Papaspiliopoulos, Ruggiero, and Spanò (2016) is \[p_{\mathbf{m}, \mathbf{n}}(t)= \begin{cases} e^{-\lambda_{|\mathbf{m}|}t} \quad &\text{if} \ \mathbf{n}= \mathbf{m}\\ C_{|\mathbf{m}|, |\mathbf{n}|}(t) \mathrm{MVH} (\mathbf{n}; |\mathbf{n}|, \mathbf{m}) \quad &\text{if} \ \mathbf{n}< \mathbf{m}\\ \end{cases}\] where \(\lambda_{|\mathbf{m}|} = \frac{|\mathbf{m}|(\theta + |\mathbf{m}| -1)}{2}\) and \[C_{|\mathbf{m}|, |\mathbf{n}|}(t) = \big(\prod_{h=|\mathbf{n}| + 1}^{|\mathbf{m}|} \lambda_{h} \big) (-1)^{|\mathbf{m}|-|\mathbf{n}|} \sum_{k=|\mathbf{n}|}^{|\mathbf{m}|} \frac{e^{-\lambda_{k} t}}{\prod_{|\mathbf{n}| \leq h \leq |\mathbf{m}|, h \neq k }(\lambda_{k} - \lambda_{h})}\] and \(|\mathbf{m}|\) represents the \(L^1\) norm (i.e. the sum) of the vector \(\mathbf{m}\).

The `propagate()`

function can be exploited to propagate
the signal. Its arguments are an `fvddp`

object and a
(positive) time. The result is the propagated process, whose matrix
`M`

will be larger and whose weights will be as described in
the formulae above.

```
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 4 7 9
#>
#> Multiplicities (M):
#> 4 7 9
#> [1,] 0 0 0
#> [2,] 1 0 0
#> [3,] 0 1 0
#> [4,] 1 1 0
#> [5,] 0 2 0
#> [6,] 1 2 0
#> [7,] 0 0 1
#> [8,] 1 0 1
#> [9,] 0 1 1
#> [10,] 1 1 1
#> [11,] 0 2 1
#> [12,] 1 2 1
#>
#> Weights (w):
#> [1] 0.060274058 0.099035520 0.198071041 0.142898157 0.071449079 0.027252056
#> [7] 0.099035520 0.071449079 0.142898157 0.054504111 0.027252056 0.005881167
```

The example shows the propagation of the signal introduced in the
previous section, with \(t = 0.6\).
Note that `y.star`

does not vary. Also, note that in examples
like this, it is sufficient a time \(t \simeq
2\) to shift almost all the mass to the component of the mixture
characterized by the zero vector.

```
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#> <bytecode: 0x12b7a6180>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 4 5 7 9 10
#>
#> Multiplicities (M):
#> 4 5 7 9 10
#> [1,] 1 1 2 0 2
#> [2,] 2 1 2 0 2
#> [3,] 1 1 3 0 2
#> [4,] 2 1 3 0 2
#> [5,] 1 1 4 0 2
#> [6,] 2 1 4 0 2
#> [7,] 1 1 2 1 2
#> [8,] 2 1 2 1 2
#> [9,] 1 1 3 1 2
#> [10,] 2 1 3 1 2
#> [11,] 1 1 4 1 2
#> [12,] 2 1 4 1 2
#>
#> Weights (w):
#> [1] 0.032822343 0.051700651 0.302672433 0.327846322 0.083102652 0.061084451
#> [7] 0.009482191 0.010270845 0.060128867 0.044197613 0.011203233 0.005488398
```

The latter chunk shows an application of `update()`

on a
larger process. A larger vector `y.new`

may induce large
variations in the weights. Being it of size \(3\), the example does not cause an
immediately recognizable effect.

In the theory of Hidden Markov Models, the smoothing operator is used to infer the state of the signal given observations from the past, the present and the future. In other words, one can estimate \(X_t\) when \(t \leq t_n\), exploiting all collected data.

To do so, it has been shown by Ascolani, Lijoi, and Ruggiero (2023) that it is required to create two processes. The first has to be propagated forward from \(t_0\) to \(t-{i-1}\) as in the previous sections; the second one has to be run backward using the same strategy: initialize and update it at \(t_n\) and propagate it towards \(t_{n-1}\) (with a positive time \(t_n - t_{n-1}\) in the function), and sequentially update and propagate until \(t_{i+1}\) is reached.

Doing this, one will get that \[X_{t_{i-1}} \ |\ Y_0, \dots, Y_{i-1} \sim \sum_{\mathbf{n}_{i-1} \in M_{i-1}} u_{\mathbf{n}_{i-1}} \Pi_{\alpha + \sum_{j=1}^K n_{i-1,j}\delta_{y_j^\ast}}\] and \[X_{t_{i+1}} \ |\ Y_{i+1}, \dots, Y_{n}= \sum_{\mathbf{n}_{i+1}\in M_{i+1}} v_{\mathbf{n}_{i+1}} \Pi_{\alpha + \sum_{j=1}^K n_{i+1,j}\delta_{y_j^\ast}}\] where the subscript \(i-1\) and \(i+1\) are necessary to specify elements from the past or the future mixture, \(v\) stands for the weights and for example \(n_{i-1, j}\) is the \(j\)-th component of the vector \(\mathbf{n}_{i-1}\) (same for \(\mathbf{n}_{i+1}\)).

Provided this description based on available data from past and future, call \(\mathbf{n}_i\) the multiplicities generated by the vector \(Y_i\). Then

\[X_{t_i} \ |\ X_{t_{i-1}}, X_{t_{i+1}}, Y_i \sim \sum_{\substack{\mathbf{n}_{i-1}\\ \in M_{i-1}}}\sum_{\substack{\mathbf{n}_{i+1}\\ \in M_{i+1}}} u_{\mathbf{n}_{i-1}} v_{\mathbf{n}_{i+1}} \sum_{\substack{(\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) \\ \in D^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}}} w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \Pi_{\alpha + \sum_{j=1}^K (\mathbf{k}_{i-1, j} + \mathbf{n}_{i,j} + \mathbf{k}_{i+1, j} )\delta_{y_j^\ast}}\] where:

if \(P_0\) is non-atomic, define the sets \[ D_{i-1} := \{ j \in \{ 1, \dots, K\} : n_{i-1, j} > 0 \ \text{and either} \ n_{i,j}>0 \ \text{or} \ n_{i+1,j}>0 \},\] \[D_{i+1} := \{ j \in \{ 1, \dots, K\} : n_{i+1, j} > 0 \ \text{and either} \ n_{i,j}>0 \ \text{or} \ n_{i-1,j}>0 \}\] and \[ S := D_{i-1} \cup D_{i+1}\] to express the indices of shared values among different times. Then \[\begin{align*} D^{\mathbf{n}_{i-1},\mathbf{n}_{i+1}} = \{ (\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) : &\mathbf{k}_{i-1}\leq \mathbf{n}_{i-1}\ \text{and} \ k_{i-1, j} > 0 \ \forall \ j \in D_{i-1},\\ &\mathbf{k}_{i+1}\leq \mathbf{n}_{i+1}\ \text{and} \ k_{i+1, j} > 0 \ \forall \ j \in D_{i+1} \} \end{align*}\] and the weights are such that:

- if \(S = \emptyset\): \[w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\ \propto \tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \frac{\theta^{(|\mathbf{k}_{i-1}|)} \theta^{(|\mathbf{k}_{i+1}|)}}{(\theta + |\mathbf{n}_i|)^{(|\mathbf{k}_{i-1}|+|\mathbf{k}_{i+1}|)}}\]
- if \(S \neq \emptyset\): \[w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \ \propto \tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \frac{\theta^{(|\mathbf{k}_{i-1}|)} \theta^{(|\mathbf{k}_{i+1}|)}}{(\theta + |\mathbf{n}_i|)^{(|\mathbf{k}_{i-1}|+|\mathbf{k}_{i+1}|)}} \\ \times \prod_{j \in S}\frac{(k_{i-1, j} + n_{i,j} + k_{i+1,j}-1)!}{(k_{i-1,j}-1)! (n_{i,j}-1)! (k_{i-1,j}-1)!} \] if \((\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) \in D\), and \(0\) otherwise, under the convention that \((-1)!=1\).

if \(P_0\) is atomic, let \[D^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}:=\{ (\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) : \mathbf{k}_{i-1}\leq \mathbf{n}_{i-1}, \mathbf{k}_{i+1}\leq \mathbf{n}_{i+1}\}\] and the weights can be expressed as \[ w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\ \propto \tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \frac{m(\mathbf{k}_{i-1}+ \mathbf{n}_i +\mathbf{k}_{i+1})}{m(\mathbf{k}_{i-1})m(\mathbf{n}_i)m(\mathbf{k}_{i+1})}\]

where \[\tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} = p_{\mathbf{n}_{i-1}, \mathbf{k}_{i-1}}(t_i - t_{i-1})p_{\mathbf{n}_{i+1}, \mathbf{k}_{i+1}}(t_{i+1} - t_i)\] is the joint transition probability from \(\mathbf{n}_{i-1}\) to \(\mathbf{k}_{i-1}\) in time \(t_i - t_{i-1}\) and from \(\mathbf{n}_{i+1}\) to \(\mathbf{k}_{i+1}\) in time \(t_{i+1} - t_i\) and \(m(\cdot)\) is the marginal likelihood function of multiplicities in the atomic case.

This peculiar structure, developed in Ascolani, Lijoi, and Ruggiero (2023), can be
better understood with some examples. They can be shown in this
implementation with `smooth()`

. The arguments are two latent
signals (`fvddp.past`

and `fvddp.future`

), the
positive times \(t_i - t_{i-1}\)
(`t.past`

) and \(t_{i+1} -
t_i\) (`t.future`

) and the data collected at time
\(t_i\) (`y.new`

).

```
FVDDP_NONATOMIC = initialize(theta = 0.7, sampling.f = function(x) rbeta(x, 4, 7),
density.f = function(x) dbeta(x, 4, 7), atomic = FALSE)
FVDDP_PAST_NONATOMIC = update(fvddp = FVDDP_NONATOMIC, y.new = c(0.210, 0.635, .541))
FVDDP_FUTURE_NONATOMIC = update(fvddp = FVDDP_NONATOMIC, y.new = c(0.210))
FVDDP_FUTURE_NONATOMIC = propagate(fvddp = FVDDP_FUTURE_NONATOMIC, delta.t = 0.4)
FVDDP_FUTURE_NONATOMIC = update(fvddp = FVDDP_FUTURE_NONATOMIC, y.new = c(.635))
```

In the example above, two process were created with \(\theta = 0.7\) and \(P_0 \sim \mathrm{Beta}(4, 7)\). The signal was updated once in the past, and twice in the future (with a propagation between the two updates).

```
FVDDP_SMOOTH_NONATOMIC = smooth(fvddp.past = FVDDP_PAST_NONATOMIC, fvddp.future = FVDDP_FUTURE_NONATOMIC,
t.past = 0.75, t.future = 0.3, y.new = c(0.210, 0.635, 0.479))
```

```
FVDDP_SMOOTH_NONATOMIC
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 4, 7)
#>
#> P0.density/mass: function(x) dbeta(x, 4, 7)
#>
#> is.atomic: FALSE
#>
#> Unique values (y.star):
#> [1] 0.210 0.479 0.541 0.635
#>
#> Multiplicities (M):
#> 0.21 0.479 0.541 0.635
#> [1,] 2 1 0 3
#> [2,] 2 1 1 3
#> [3,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.23344663 0.03392615 0.73262722
```

Using the function on the two processes, it is possible to see that the structure described above for the nonatomic case causes a shrinkage of the mixture. Indeed, the set \(M\) only contains three vectors.

In order to make a comparison, one can try to do something similar taking \(P_0 \sim \mathrm{Binom}(10, 0.6)\).

```
FVDDP_ATOMIC = initialize(theta = 0.7, sampling.f = function(x) rbeta(x, 10, 0.6),
density.f = function(x) dbinom(x, 10, 0.6), atomic = TRUE)
FVDDP_PAST_ATOMIC = update(fvddp = FVDDP_ATOMIC, y.new = c(2, 6, 5))
FVDDP_FUTURE_ATOMIC = update(fvddp = FVDDP_ATOMIC, y.new = c(2))
FVDDP_FUTURE_ATOMIC = propagate(fvddp = FVDDP_FUTURE_ATOMIC, delta.t = 0.4)
FVDDP_FUTURE_ATOMIC = update(fvddp = FVDDP_FUTURE_ATOMIC, y.new = c(6))
```

As before, the mixture referred to past observations is updated once, the one referred to future observations is updated twice.

```
FVDDP_SMOOTH_ATOMIC = smooth(fvddp.past = FVDDP_PAST_ATOMIC, fvddp.future = FVDDP_FUTURE_ATOMIC,
t.past = 0.75, t.future = 0.3, y.new = c(2, 6, 4))
```

```
FVDDP_SMOOTH_ATOMIC
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#>
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 2 4 5 6
#>
#> Multiplicities (M):
#> 2 4 5 6
#> [1,] 1 1 0 1
#> [2,] 1 1 0 2
#> [3,] 1 1 0 3
#> [4,] 1 1 1 1
#> [5,] 1 1 1 2
#> [6,] 1 1 1 3
#> [7,] 1 1 2 1
#> [8,] 1 1 2 2
#> [9,] 1 1 2 3
#> [10,] 2 1 0 1
#> [11,] 2 1 0 2
#> [12,] 2 1 0 3
#> [13,] 2 1 1 1
#> [14,] 2 1 1 2
#> [15,] 2 1 1 3
#> [16,] 2 1 2 1
#> [17,] 2 1 2 2
#> [18,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.0001721823 0.0025036150 0.0094628016 0.0002320301 0.0024384330
#> [6] 0.0068287189 0.0004682311 0.0037328174 0.0075867038 0.0117827759
#> [11] 0.1266737735 0.3103950448 0.0112750878 0.0913256329 0.1717719912
#> [16] 0.0153587293 0.0979424891 0.1300489424
```

In this case, the mixture is clearly bigger. The reason is that when \(P_0\) is atomic, the set \(D^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\) does not put constraints based on the appearance of shared values across different times.

Past examples should also provide an insight on the main issue
related to `propagate()`

and `smooth()`

: the size
of the matrix \(M\) grows polynomially
with respect to the amount of collected data; moreover, it can be seen
that as the number of weights increases, many of them become almost
negligible.

In order to avoid long computations, it is possible to use
`approx.propagate()`

: it reproduces the propagation of the
signal by means of Monte Carlo method. The idea, proposed by Ascolani, Lijoi, and Ruggiero (2021), is to
mimic the evolution of the \(K\)-dimensional death process using a
one-dimensional one, and then extracting a multidimensional vector with
a multivariate hypergeometic distributions.

```
FVDDP =initialize(theta = 3, sampling.f= function(x) rnorm(x, -1, 3),
density.f = function(x) dnorm(x, -1, 3), atomic = FALSE)
FVDDP = update(fvddp = FVDDP, y.new = c(-1.145, 0.553, 0.553, 0.553))
```

In the previous chunk, a process with hyperparameters \(\theta = 3\) and \(P_0 \sim \mathcal{N}(-1, 3)\) was created
and updated. The syntax of the approximating functions is just the same
as in `propagate()`

, with the exceptions that one must
specify the number of samples `N`

to be drawn.

```
FVDDP_APPR_PROP
#> theta: 3
#>
#> P0.sample: function(x) rnorm(x, -1, 3)
#>
#> P0.density/mass: function(x) dnorm(x, -1, 3)
#>
#> is.atomic: FALSE
#>
#> Unique values (y.star):
#> [1] -1.145 0.553
#>
#> Multiplicities (M):
#> -1.145 0.553
#> [1,] 0 0
#> [2,] 0 1
#> [3,] 0 2
#> [4,] 0 3
#> [5,] 1 0
#> [6,] 1 1
#> [7,] 1 2
#> [8,] 1 3
#>
#> Weights (w):
#> [1] 0.13530 0.33260 0.17440 0.02000 0.10310 0.17210 0.05785 0.00465
```

The results is again an `fvddp`

object. In order to
measure the accuracy of such approximation, one has to compute the exact
output of the propagation, again with time \(t=0.45\).

Then one can measure the difference in the weights with
`error.estimate()`

. The arguments are
`fvddp.exact`

and `fvddp.approx`

, and the output
is a vector containing the difference among the weights, in absolute
value. The option `remove.unmatched`

allows to choose
whenever a vector is in the exact propagation but not in the
approximate: if `TRUE`

, the missing weight is assumed to be
\(0\), if `FALSE`

, this
comparison will not be reported in the output (which will result to be
shorter).

```
error.estimate(FVDDP_EXACT_PROP, FVDDP_APPR_PROP)
#> [1] 0.0058286503 0.0068326918 0.0028019247 0.0014386014 0.0008613986
#> [6] 0.0015530747 0.0001989751 0.0001334191
```

Something similar can be done for the smoothing via
`approximate.smooth()`

; in this case the Monte Carlo method
is necessary to support importance sampling, where the importances are
given by the right hand side of the formulae for \(w_{\mathbf{k}_{i-1}, \mathbf{n}_i,
\mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\). For
this reason, the result of the simulation may be less stable than in the
case of the propagation seen above, and a larger amount of samples will
be required to achieve a good accuracy.

In the following example, one can see how to copy wht was done in the exact smoothing.

```
FVDDP_SMOOTH_APPR = approx.smooth(fvddp.past = FVDDP_PAST_ATOMIC, fvddp.future = FVDDP_FUTURE_ATOMIC,
t.past = 0.75, t.future = 0.3, y.new = c(2, 6, 4), N = 50000)
```

```
FVDDP_SMOOTH_APPR
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#>
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 2 4 5 6
#>
#> Multiplicities (M):
#> 2 4 5 6
#> [1,] 1 1 0 1
#> [2,] 1 1 0 2
#> [3,] 1 1 0 3
#> [4,] 1 1 1 1
#> [5,] 1 1 1 2
#> [6,] 1 1 1 3
#> [7,] 1 1 2 1
#> [8,] 1 1 2 2
#> [9,] 1 1 2 3
#> [10,] 2 1 0 1
#> [11,] 2 1 0 2
#> [12,] 2 1 0 3
#> [13,] 2 1 1 1
#> [14,] 2 1 1 2
#> [15,] 2 1 1 3
#> [16,] 2 1 2 1
#> [17,] 2 1 2 2
#> [18,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.0002318445 0.0032076427 0.0117206721 0.0002089701 0.0021740870
#> [6] 0.0057754325 0.0003174570 0.0025240938 0.0051318324 0.0161197332
#> [11] 0.1568137178 0.3993952863 0.0088537112 0.0776609913 0.1427394035
#> [16] 0.0102407155 0.0676997460 0.0891846631
```

```
error.estimate(FVDDP_SMOOTH_ATOMIC, FVDDP_SMOOTH_APPR)
#> [1] 5.966216e-05 7.040276e-04 2.257871e-03 2.305999e-05 2.643460e-04
#> [6] 1.053286e-03 1.507740e-04 1.208724e-03 2.454871e-03 4.336957e-03
#> [11] 3.013994e-02 8.900024e-02 2.421377e-03 1.366464e-02 2.903259e-02
#> [16] 5.118014e-03 3.024274e-02 4.086428e-02
```

Another tool to cut the computational cost of predictive of smoothing
inference is given by the `prune()`

function. It allows to
remove from the mixture all vectors \(\mathbf{m}\) whose weight \(w_\mathbf{m}\) is under some treshold \(\varepsilon\). Such eights are then
normalized such that their sum is \(1\).

In the example, the function will be applied to one of the processes prevously calculated, fixing \(\varepsilon = 10^{-2}\).

```
PRUNED
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#>
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 2 4 5 6
#>
#> Multiplicities (M):
#> 2 4 5 6
#> [1,] 2 1 0 1
#> [2,] 2 1 0 2
#> [3,] 2 1 0 3
#> [4,] 2 1 1 1
#> [5,] 2 1 1 2
#> [6,] 2 1 1 3
#> [7,] 2 1 2 1
#> [8,] 2 1 2 2
#> [9,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.01219024 0.13105433 0.32112895 0.01166500 0.09448380 0.17771211 0.01588986
#> [8] 0.10132948 0.13454622
```

In this context, the treshold is insanely high; this is done for the unique purpose of showing how the function works; in the practical use of the package, a reasonable \(\varepsilon\) is between \(10^{-9}\) and the machine epsilon of the computer in use.

The last task that it can be performed is sampling values from Fleming-Viot dependent Dirichlet Processes. This can be done by simply choosing a vector \(w_\mathbf{m}\) and drawing a value from \(P_0\) with probability \(\frac{\theta}{\theta + |\mathbf{m}|}\), or choosing \(y_m^\ast\) with probability \(\frac{m_j}{\theta + |\mathbf{m}|}\). To get a sample of size \(N\), it is sufficient to replicate this mechanism \(n\) times.

The implementation is named `posterior.sample()`

. Its
arguments are the signal and the number `N`

of values to
draw.

```
table(round(y, 3))
#>
#> -8.67 -7.232 -7.197 -6.912 -6.764 -5.968 -5.659 -5.249 -5.19 -5.114 -5.105
#> 1 1 1 1 1 1 1 1 1 1 1
#> -4.826 -4.375 -4.326 -4.21 -4.073 -3.836 -3.728 -3.562 -3.428 -2.793 -2.746
#> 1 1 1 1 1 1 1 1 1 1 1
#> -2.304 -2.162 -2.04 -1.687 -1.624 -1.555 -1.447 -1.286 -1.21 -1.145 -1.07
#> 1 1 1 1 1 1 1 1 1 3 1
#> -0.965 -0.706 -0.656 -0.561 -0.465 -0.462 -0.298 -0.256 -0.134 -0.059 -0.032
#> 1 1 1 1 1 1 1 1 1 1 1
#> 0.128 0.208 0.369 0.384 0.444 0.553 0.652 0.817 0.95 1.014 1.036
#> 1 1 1 1 1 26 1 1 1 1 1
#> 1.092 1.367 1.442 1.535 1.603 2.016 2.233 2.29 2.341 2.447 2.766
#> 1 1 1 1 1 1 1 1 1 1 1
#> 2.959 3.393 3.893 4.828 4.854 5.744 5.833
#> 1 1 1 1 1 1 1
```

The command `table()`

was used here to display more
efficiently how many times each value has been sampled.

In the Bayesian Nonparametric framework, scientists prefer to use the predictive structure of the Dirichlet process when they want to picture how future observations will be like. This choice is strongly related to the exchangeability assumption underlying the model (more in Ghosal and Vaart (2017)); in this context, however, it is sufficient to say that predictive structure is nothing but the sequential use of posterior sampling and update. In fact, a value is repeatedly drawn from the mixture and it is incorporated within each vector \(\mathbf{m}\in M\) via an update. A full description of this mechanism was developed by Ascolani, Lijoi, and Ruggiero (2021).

This is implemented efficiently via `predictive.struct()`

;
the arguments are the same as in `posterior.sample()`

.

Antoniak, Charles E. 1974. “Mixtures of
Dirichlet Processes with Applications to Bayesian Nonparametric
Problems.” *The Annals of Statistics* 2 (6):
1152–74. https://doi.org/10.1214/aos/1176342871.

Ascolani, Filippo, Antonio Lijoi, and Matteo Ruggiero. 2021.
“Predictive inference with
Fleming–Viot-driven dependent Dirichlet processes.”
*Bayesian Analysis* 16 (2): 371–95. https://doi.org/10.1214/20-BA1206.

———. 2023. “Smoothing distributions for
conditional Fleming–Viot and Dawson–Watanabe diffusions.”
*Bernoulli* 29 (2): 1410–34. https://doi.org/10.3150/22-BEJ1504.

Blackwell, David, and James B. MacQueen. 1973. “Ferguson
Distributions Via Polya Urn Schemes.” *The Annals of
Statistics* 1 (2): 353–55. https://doi.org/10.1214/aos/1176342372.

Ethier, S. N., and R. C. Griffiths. 1993. “The Transition Function of a Fleming-Viot
Process.” *The Annals of Probability* 21 (3):
1571–90. https://doi.org/10.1214/aop/1176989131.

Fleming, Wendell H., and Michel Viot. 1979. “Some Measure-Valued
Markov Processes in Population Genetics Theory.” *Indiana
University Mathematics Journal* 28 (5): 817–43. http://www.jstor.org/stable/24892583.

Ghosal, Subhashis, and Aad van der Vaart. 2017. *Fundamentals of
Nonparametric Bayesian Inference*. Cambridge Series in Statistical
and Probabilistic Mathematics. Cambridge University Press. https://doi.org/10.1017/9781139029834.

Papaspiliopoulos, Omiros, and Matteo Ruggiero. 2014. “Optimal
Filtering and the Dual Process.” *Bernoulli* 20 (4). https://doi.org/10.3150/13-bej548.

Papaspiliopoulos, Omiros, Matteo Ruggiero, and Dario Spanò. 2016.
“Conjugacy properties of time-evolving
Dirichlet and gamma random measures.” *Electronic
Journal of Statistics* 10 (2): 3452–89. https://doi.org/10.1214/16-EJS1194.

Tavaré, Simon. 1984. “Lines-of-Descent and Genealogical Processes,
and Their Applications in Population Genetics Models.”
*Advances in Applied Probability* 16 (1): 27–27. https://doi.org/10.1017/S000186780002228X.