Automation of preprocessing often requires computationally costly preprocessing combinations. This package helps to find near-best combinations faster. Metaheuristics supported are taboo search, simulated annealing, reheating and late acceptance. Start conditions include randon and grid starts. End conditions include all iteration rounds completed, objective threshold reached and convergence. Metaheuristics, start and end conditions can be hybridized and hyperparameters optimized. Parallel computations are supported. The package is intended to be used with package ‘preprocomb’ and takes its ‘GridClass’ object as input.

Let’s start by adding contaminations to Iris-data to simulate the need for preprocessing:

```
set.seed(1)
testdata <- iris
testdata[sample(1:150,40),3] <- NA # add missing values to the third variable
testdata[,4] <- rnorm(150, testdata[,4], 2) # add noise to the fourth variable
testdata$Irrelevant <- runif(150, 0, 1) # add an irrelevant feature
```

A grid of 540 preprocessing combinations and corresponding preprocessed data sets to be searched from is created with package preprocomb and its setgrid() function.

```
library(preprocomb)
examplegrid <- preprocomb::setgrid(phases=c("imputation", "smoothing", "scaling", "outliers", "selection"), data=testdata)
```

The metaheuristic search is controlled by the metaheur() function. An example below does 54 iterations (10% of the search space) with 20 times validated holdout classification accuracy. Classifier is support vector machine svmRadial from kernlab package (assumed to be loaded.)

`library(metaheur)`

`examplesearch <- metaheur(examplegrid, model="svmRadial", iterations = 54, nholdout = 20)`

Execution wall-clock time in minutes can be extracted with:

`getwalltime(examplesearch)`

`## [1] 12`

The resulting near-optimal solution can be extracted by getbestheur() function.

`getbestheur(examplesearch)`

```
## [[1]]
## imputation smoothing scaling outliers selection
## 3 meanclassimpute noaction noaction noaction noaction
##
## [[2]]
## [1] 0.9127451
```

The logs can be extracted:

`getlogs(examplesearch, 15)`

```
## [1] "Start type: random restarts."
## [2] "Number of restarts: 1"
## [3] "Start combination: 179"
## [4] "Iteration: 1 Current best: knnimpute lowesssmooth softmaxscale orhoutlier noaction 0.8"
## [5] "Iteration: 1 Candidate: knnimpute lowesssmooth noaction orhoutlier noaction 0.81"
## [6] "Temperature: 0"
## [7] "Comparison value for late acceptance: 0.8"
## [8] "History delta, last five: 0.01"
## [9] "Iteration: 2 Current best: knnimpute lowesssmooth noaction orhoutlier noaction 0.81"
## [10] "Iteration: 2 Candidate: knnimpute noaction noaction orhoutlier noaction 0.86"
## [11] "Temperature: 0"
## [12] "Comparison value for late acceptance: 0.81"
## [13] "History delta, last five: 0.03"
## [14] "Iteration: 3 Current best: knnimpute noaction noaction orhoutlier noaction 0.86"
## [15] "Iteration: 3 Candidate: naomit noaction noaction orhoutlier noaction 0.82"
```

Metaheuristic search hyperparameters can be optimized with either grid or random search.

`examplehyperparam <- metaheurhyper(examplegrid, cores=2, trials=10, iterations=54, nholdout=10)`

The resulting hyperparameter search result can be extracted by plotting the results.

`plotsearchpath(examplehyperparam)`

The plot title shows the combination of hyperparameters that has the highest mean of the best of runs in trials. The panels show best, worst and median scenarios for the combination. In the example best hyperparameters were surprisingly explorative.