The `anchoredDistr`

package is intended to handle the post-processing of projects created by MAD#, the software implmentation of the Method of Anchored distributions (see our Codeplex site). Similarly structured data not extracted from the MAD# software can also be used with some formatting.

However, the package is in early development and as such only has the following features:

- Reading the MAD# databases into a
`MADproject`

S4 class object - Calculating non-parametric likelihood values using the
`np`

package - Testing convergence of likelihood values as a function of the number of realizations
- Calculating posterior distributions
- Generating plots for the observations, realizations, and posteriors

while making the following assumptions:

- There is either multiple measurement locations with no time dependence or only one measurement location with a time series of data.
- The inversion data supported consist of the multi-well measurements, a subset of timesteps from the single measurement location’s time series, a parameter-less function (e.g.
`min`

) of that time series, or a function to be fitted to the time series (e.g.`matern`

).

This vignette will step through an example of applying `anchoredDistr`

using the dataset `pumping`

, which contains results from a MAD# project pertaining to characterizing an aquifer’s mean natural-log hydraulic conductivity by using a time series of drawdown (change in hydraulic head) at a monitoring well in the field as inversion data. The `MADproject`

object has slots `numSamples`

, `numAnchors`

, `numTheta`

, `observations`

, `priors`

, `true values`

, `numLocations`

and `realizations`

filled. Normally, the function `readMAD`

would be used to fill in the object from databases produced by MAD#, but this has done already for `pumping`

for a more portable example. However, MAD# is not entirely necessary: you can fill in a `MADproject`

object with data from another application and still apply the MAD analysis.

For example, we can install and load the package:

`install.packages(anchoredDistr)`

`library(anchoredDistr)`

and then create a MADproject object given the following minimum information:

Observations of inversion data: For the pumping example, there is one measurement location with 100 time steps. The format required is a vector of length (number of time steps).

Prior distribution samples: The samples of the prior distributions for each strucutural parameter and anchor used. For the pumping example, there are 50 samples of one parameter and no anchors. The required format is a data.frame with columns

`sid`

(the sample ID),`priordens`

(the marginal density associated with the sample),`tid`

(the parameter ID),`name`

(the parameter name), and`priorvalue`

(the sampled value of the parameter).Realizations: Simulated values of the inversion data based on each of the prior samples. For the pumping example. The required format is a data.frame with columns

`sid`

(the sample ID),`rid`

(the realization ID),`zid`

(the inversion data ID), and`value`

(the simulated value). Below shows the configuration of raw data in the`pumpingInput`

data set and how it can be used to create a MADproject object without a MAD# database.

```
load(system.file("extdata", "pumpingInput.RData", package = "anchoredDistr"))
head(obs)
```

`## [1] 0.000000 -6.771288 -16.222989 -25.283413 -32.980097 -39.244209`

`head(realizations)`

```
## sid rid zid value
## 200001 1 1 1 -8.378644
## 200002 1 2 1 -8.569131
## 200003 1 3 1 -8.720720
## 200004 1 4 1 -11.672956
## 200005 1 5 1 -9.151426
## 200006 1 6 1 -8.188874
```

`head(priors)`

```
## sid priordens tid name priorvalue
## 2 1 0.3333 1 Mean -9.0
## 3 2 0.3333 1 Mean -9.5
## 4 3 0.3333 1 Mean -10.0
```

```
proj <- new("MADproject",
numLocations = 1,
numTimesteps = 100,
numSamples = 50,
numAnchors = 0,
numTheta = 1,
observations = obs,
realizations = realizations,
priors = priors)
```

However, the same data, plus more information, is available in the `pumping`

dataset so it will be used for the remainder of the vignette.

`data(pumping)`

`MADproject`

informationYou can use the `print`

function to preview what the `MADproject`

object contains:

`print(pumping)`

```
## NAMES:
## The MAD project name: pumping
## The MAD result name: example2
## The path for the MAD project:
##
## CONFIGURATION VALUES:
## The number of measurement locations: 1
## The number of time steps: 100
## The number of samples: 3
## The number of anchors: 0
## The number of structural parameters: 1
##
## DATA DIMENSIONS:
## Size of true values: 1 2
## Size of observations: 100
## Size of realizations: 30000 4
## Size of priors: 3 5
## Size of likelihoods: 0 0
## Size of posteriors: 0 0
```

The `plotMAD`

function can be used to view different data from the `MADproject`

object. Pass the object as the first argument followed by

- nothing: yields all available plots given data
`"observations"`

: yields a plot of the observation as a function of time step.`"realizations"`

: yields a plot of the samples’ realizations as a polygon representing the interquartile ranges of the realization values as a function of the time steps. Only works if`@numSamples`

is less than six. The observations is also plotted for comparison.`"posterior"`

: yields the marginal posterior distributions for the samples.`"prior"`

: yields the marginal prior distributions for the samples.

Below is an example of requesting to plot the realizations for the `pumping`

dataset.

`plotMAD(pumping, "realizations")`

The `anchoredDistr`

package can take this information and calculate the posterior of the random parameter in question based on requested inversion data. Below, the 100th time step is used as the inversion data, then the posterior is calculated and plotted (again, using the `plotMAD`

function).

```
pumping <- calcLikelihood(pumping, 100)
pumping <- calcPosterior(pumping)
```

`plotMAD(pumping, "posteriors")`

The `anchoredDistr`

package can take this information and calculate the posterior of the random parameter in question based on requested inversion data. Below, the minimum value in the time series is used as the inversion data, then the posterior is calculated and plotted (again, using the `plotMAD`

function). This is the same as using the 100th time step, but showcasing the ability to provide a function instead of a subset for the reduction.

```
pumping.min <- reduceData(pumping, min)
pumping.min <- calcLikelihood(pumping.min)
pumping.min <- calcPosterior(pumping.min)
```

`plotMAD(pumping.min, "posteriors")`

Even more complicated functions can be passed. For example, this `matern`

function:

```
matern <- function(x, params){
sigma <- params[1]
lambda <- params[2]
kappa <- params[3]
t <- sqrt(2*kappa)*x/lambda
cov <- ((sigma*(t^kappa)/gamma(kappa))*2^(1-kappa))*besselK(t,kappa)
return(sigma-cov)
}
```

If we want to fit this `matern`

function to the time series, we need to provide `nls`

with initial values for the three parameters. Here is a function to estimate these initial values:

```
init.matern <- function(x){
params<- c()
params[1] <- min(x)
params[2] <- min(10, tail(which(x > 0.3*min(x)),1))
params[3] <- 0.5
return(params)
}
```

We can pass these two functions to `reduceData`

for fitting a matern model to each time series and performing the inversion with the three parameters:

`pumping.matern <- reduceData(pumping, matern, init.matern, lower=c(-Inf,1,0.1), upper=c(0,100,5), algorithm="port")`

`plotMAD(pumping.matern, "realizations")`

```
pumping.matern <- calcLikelihood(pumping.matern)
pumping.matern <- calcPosterior(pumping.matern)
```

`plotMAD(pumping.matern, "posteriors")`

In order to assess the convergence of the likelihood values, you can call the `testConvergence`

function that will take a MADproject object and calculate likelihood values for a range of realization counts.

`testConvergence(pumping.matern)`