To demonstrate the package functionality we are going to train a simple bat detector and test it on new sampling recordings. We start by loading the soundClass package:


Please note that before using soundClass for the first time it is necessary to have tensorflow and keras backend installed. This can be achieved by running the function install_keras(). If prompted to install ‘miniconda’ type ‘Y’ to install it:

keras::install_keras() # only needed before first use of soundClass

To run the example, external data must be downloaded and uncompressed. We define that folder as the working directory and rename it to “data_example” for convenience:


The data should now be manually downloaded to the new folder and unziped: https://doi.org/10.6084/m9.figshare.19550605 The downloaded data includes an already created annotations database, as well as the necessary recordings for training and validation. The database was created with the app_label(), as there is no way of creating it by scripting, and contains the annotations of relevant and non-relevant sound events in the supplied training recordings. After unzip, the data_example folder should have the following structure:

  |     db_bat_calls.sqlite3
  └──── training_recordings
  └──── validation_recordings

A custom blank model is provided with the package but should be copied to an external folder for easy access, specially for using the app_model():

model_path <- system.file("model_architectures", "model_vgg_sequential.R", package="soundClass")
file.copy(model_path, ".") # Copy custom blank model to the folder 'data_example'

If an annotations database already exists, there are two ways of using the package to fit or to run a fitted model: 1) by using the app_model() or 2) by scripting as shown in this example. To create the train data, spectrogram images with a pre-defined size and centered at the annotations of the recordings are calculated with the function spectro_calls(). Several parameters regarding the computation of the spectrograms must be set to run this function. Some parameters refer to image resolution and are generalist but the following three are directly related to the type of events being classified: “spec_size”, “window_length” and “freq_range”. These parameters refer to the total size of the spectrogram, the size of the moving window and the frequencies range in analysis and should be chosen to cover: the maximum event length, the rate of change (quickly changing events should use shorter windows) and the typical frequencies range of the events being classified. As European insectivorous bats (excluding Rhinolophus genus) are the target, we chose appropriate settings based on bibliography:

# To use the app instead of scripting:
# app_model()

train_calls <- spectro_calls(
  files_path = "./training_recordings/",
  db_path = "./db_bat_calls.sqlite3",
  spec_size = 20,
  window_length = 0.5, 
  frequency_resolution = 1,
  overlap =  0.5,
  dynamic_range = 100,
  freq_range = c(10, 80),
  tx = "auto",
  seed = 1002)

# save the object to disk
save(train_calls, file = "train_calls.RDATA")

Once the data has been prepared, a blank model must be loaded into R and compiled. A pre-defined model is supplied with the package. Two variables must be defined in the global environment before loading the model: the input shape (input_shape) with the format “c(number of rows, number of columns, number of channels)” and the number of classes (num_classes). These values can be found in the metadata object created with train_metada(). Please note that the number of channels is always 1, as we are working with grayscale images:

input_shape <- c(metadata$parameters$img_rows, metadata$parameters$img_cols, 1)
num_classes <- metadata$parameters$num_classes
model_path <- system.file("model_architectures", "model_vgg_sequential.R", package="soundClass")
source(model_path, local = TRUE)

With the training data prepared and the blank model loaded, the parameters for compiling and fitting the model must be set. Please note that as we are using classification models, both parameters “loss” and “metrics” should always be set as they are in this example. The other parameters can be tuned as desired. For further information about the parameters in the optimizer please refer to Keras documentation: https://keras.io/api/.

 model %>%
   optimizer = keras::optimizer_sgd(
                learning_rate = 0.01,
                momentum = 0.9,
                nesterov = TRUE),
   loss = 'categorical_crossentropy',
   metrics = 'accuracy'

The model is now ready to be fitted. Three callbacks (objects that can perform actions at various stages of training) are defined: 1) early stopping, 2) model checkpoint and 3) CSV logger. These objects permit to, respectively: 1) stop the fitting process if there is no improvement in the validation dataset accuracy for a defined number of epochs, 2) save the partially fitted model to disk (the workfolder in this example) after each iteration and keep only the best model after training for further use and 3) save to disk the log (the workfolder in this example) of the training process. For further information about the callbacks available and their usage please refer to Keras documentation (https://keras.io/api/). To fit the model the function keras::fit() is used:

 model %>% keras::fit(train_calls$data_x,
        batch_size = 128,
        epochs = 20,
        callbacks = list(
         keras::callback_early_stopping(patience = 4, monitor = 'val_accuracy'),
                      monitor = "val_accuracy", save_best_only = T),
        shuffle = TRUE,
        validation_split = 0.3,
        verbose = 1)

To run a a fitted model later, some information regarding the parameters used for creating the training spectrograms will be needed. These data can be obtained from the train_calls object:

metadata <- train_metadata(train_calls)

# save the object to disk
save(metadata, file = "./fitted_model_metadata.RDATA")

The fitted model and an additional log file are saved to disk in the working folder for further use in the classification of novel recordings. To validate our model, we use recordings not used to calibrate the model. We use recordings with the same species used for training but also new bat species to evaluate the model transferability. The classification is applied to the recordings with function auto_id() and the results are saved to a folder created in the same folder where the recordings are, in this cased named “output”:

auto_id(model_path = "./fitted_model.hdf5",
        metadata = "./fitted_model_metadata.RDATA", # or "./fitted_model_metadata.RDATA" (the path to the previously saved metadata file)
        file_path = "./validation_recordings/",
        out_file = "id_results",
        out_dir = "./validation_recordings/output/",
        save_png = TRUE,          
        win_size = 40,
        remove_noise = TRUE,
        plot2console = FALSE,
        recursive = FALSE,
        tx = "auto")

The classification process outputs a database in the sqlite3 format with all the relevant events detected and the respective probability to belong to a given class. Additionally a file in the CSV format is saved to disk, containing summary statistics per recording, i.e. the class with most events detected in each particular recording and the average frequency of maximum energy of the events detected, for reporting and manual review. If desired, one spectrogram per recording with the temporal position of the events detected may also be obtained.