Introduction

Small movements of tagged animals result in discernible variations in the strength of the received signal (Cochran et al. (1965); Kjos and Cochran (1970)) that reflect changes in the angle and distance between the transmitter and receiver. Kays et al. (2011) proposed a method for automatically classifying active and passive behaviour based on a threshold difference in the signal strength of successive VHF signals recorded by a commercial automatic radio-tracking system. However, machine learning (ML) algorithms are optimised for the recognition of complex patterns in a dataset and are typically robust against factors that influence signal propagation, such as changes in temperature and humidity, physical contact with conspecifics and/or multipath signal propagation (Alade (2013)). Accordingly, a ML model trained with a dataset encompassing the possible diversity of signal patterns related to active and passive behaviour can be expected to perform at least as well as a threshold-based approach. In this work, we built on the methodology of Kays et al. (2011) by calibrating two random forest models (1 for data comming from only one receiver and one for data coming from at least two receivers), based on millions of data points representing the behaviours of multiple tagged individuals of two temperate bat species (Myotis bechsteinii, Nyctalus leisleri).

The method was tested by applying it to independent data from bats, humans, and a bird species and then comparing the results with those obtained using the threshold-based approach of Kays et al. (2011) applying a threshold value of 2.5 dBw signalstrength difference suggested by Schofield et al 2018.

In order to make our work comprehensible, code and data are made available to all interested parties. Data for model training can be found here. Data for evaluation is stored here.

This resource contains the following steps:

  • Training two models in conjunction with forward feature selection
  • Parameter tuning
  • Validation using three independent data sets
  • Comparison to a threshold based approach as proposed in Kays et al. 2011

But before we get started:

Why Random Forest?

Although deep learning methods have been successfully applied to several ecological problems where large amounts of data are available (Christin, Hervet, and Lecomte (2019)), we use a random forest model due to the following reasons:

    1. developing a (supervised) deep learning method requires considerable effort for selecting an appropriate neural network architecture, choosing an appropriate framework to implement the neural network and training, validating, testing, and refining the neural network (Christin, Hervet, and Lecomte (2019))
    1. essentially, we have to solve a binary classification task based on tabular data; in this setting, tree ensemble methods such as random forests seem to have clear advantages - they are less computationally intensive, easy to implement, robust, and at least as performant as deep learning (Shwartz-Ziv and Armon (2022))
    1. in a large study comparing 179 classifiers applied to the 121 classification data sets of the UCI repository, random forests are the best classifiers in over 90% of the cases (Fernández-Delgado et al. (2014)).

Model training and tuning

For model training and tuning we use the caret R-Package (Kuhn 2008). For the forward feature selection we use the CAST R-Package developed by Meyer et al. 2018.

Additional packages needed are: randomForest,ranger, doParallel , MLeval, data.table, dplyr, plyr

Load packages

library(caret); library(randomForest);library(ranger); library(doParallel);library(MLeval);library(CAST);library(data.table);library(dplyr);library(plyr)

Data for model training

Only one antenna is necessary to classify VHF signals into active vs. passive states (Kays et al. 2011). However, agreement between receivers of the same station provides additional information and can improve the reliability of the classification. Our groundtruth dataset was balanced by randomly down-sampling the activity class with the most data to the amount of data contained by the class with the least data. These balanced datasets were then split into 50% training data and 50% test data for data originating from one receiver. The same procedure was used for data derived from the signals of two receivers, resulting in two training and two test datasets. From a total of 3,243,753 VHF signals, 124,898 signals were assigned to train the two-receiver model and 294,440 signals to train the one-receiver model (Table 1).

Feature selection

Since not all variables are equally important to the model and some may even be misleading,we performed a forward feature selection on 50% of the training data. The forward feature selection algorithm implemented in the R package CAST (Meyer et al. (2018)) selects the best pair of all possible two variable combinations by evaluating the performance of a k-fold cross-validation (CV). The algorithm iteratively increases the number of predictors until no improvement of the performance is achieved by adding further variables.

1 Receiver model

# get data and check class distribution
data_1<-readRDS("model_tuning/data/batsTrain_1_receiver.rds")

table(data_1$Class)
## 
##  active passive 
##  294173  294173
#forward feature selection

predictors<-names(data_1[, -ncol(data_1)])


cl<-makeCluster(10)

registerDoParallel(cl)

ctrl <- trainControl(## 10-fold CV
  method = "cv",
  number = 10)


#run ffs model with 10-fold CV
set.seed(10)

ffsmodel <- ffs(predictors=data_1[,predictors],response = data_1$Class,method="rf",
                metric="Kappa",
                tuneLength = 1,
                trControl=ctrl,
                verbose = TRUE)

ffsmodel$selectedvars


saveRDS(ffsmodel, "model_tunig/models/m_r1.rds")

stopCluster(cl)

Results feature selection

Red dots display two-variables combinations, dots with the colors from yellow to pink stand for models to each of which another variable has been added. Dots with a black border mark the optimal variable combination in the respective iteration.

m1<-readRDS("model_tuning/models/m_r1.rds")

print(m1)
## Random Forest 
## 
## 588346 samples
##      7 predictor
##      2 classes: 'active', 'passive' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 529512, 529512, 529511, 529512, 529511, 529511, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.9631628  0.9263257
## 
## Tuning parameter 'mtry' was held constant at a value of 2
print(plot_ffs(m1))

Variable importance 1 receiver model

plot(varImp(m1))

2 Receivers model

#get data and check class distribution
data_2<-readRDS("model_tuning/data/batsTrain_2_receivers.rds")

table(data_2$Class)
## 
##  active passive 
##  110274  110274
predictors<-names(data_2[, -ncol(data_2)])


cl<-makeCluster(10)

registerDoParallel(cl)

ctrl <- trainControl(## 10-fold CV
  method = "cv",
  number = 10)

#run ffs model
set.seed(10)

ffsmodel <- ffs(predictors=data_2[,predictors],response = data_2$Class,method="rf",
                metric="Kappa",
                tuneLength = 1,
                trControl=ctrl,
                verbose = TRUE)

ffsmodel$selectedvars

saveRDS(ffsmodel, "model_tuning/models/m_r2.rds")

stopCluster(cl)

Results feature selection

Red dots display two-variables combinations, dots with the colors from yellow to pink stand for models to each of which another variable has been added. Dots with a black border mark the optimal variable combination in the respective iteration.

m2<-readRDS("model_tuning/models/m_r2.rds")
print(m2)
## Random Forest 
## 
## 220548 samples
##      8 predictor
##      2 classes: 'active', 'passive' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 198494, 198493, 198494, 198494, 198492, 198492, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.9740011  0.9480022
## 
## Tuning parameter 'mtry' was held constant at a value of 2
print(plot_ffs(m2))

Variable importance 2 receivers model

plot(varImp(m2))

Model tuning

Random forest is an algorithm which is far less tunable than other algorithms such as support vector machines (Probst, Wright, and Boulesteix (2019)) and is known to provide good results in the default settings of existing software packages (Fernández-Delgado et al., 2014). Even though the performance gain is still low, tuning the parameter mtry provides the biggest average improvement of the AUC (0.006) (Probst et al.2018). Mtry is defined as the number of randomly drawn candidate variables out of which each split is selected when growing a tree. Here we reduce the existing predictor variables to those selected by the forward feature selection and iteratively increase the number of randomly drawn candidate variables from 1 to the total number of selcted variables. Other parameters, such as the number of trees are held constant according to default settings in the packages used.

Tuning mtry on the 1 receiver model

#reduce to ffs variables
predictors<-names(data_1[, c(m1$selectedvars, "Class")])
batsTune<-data_1[, predictors]

#tune number of variable evaluated per tree- number of trees is 500
ctrl <- trainControl(## 10-fold CV
  method = "cv",
  number = 10,
  verboseIter = TRUE)
  )

tunegrid <- expand.grid(
  mtry = 1:(length(predictors)-1),                                  # mtry specified here
  splitrule = "gini"
  ,min.node.size = 10
)

tuned_model <- train(Class~.,
                    data=batsTune,
                    method='rf',
                    metric='Kappa',
                    tuneGrid=tunegrid,
                    ntree=1000,
                    trControl=ctrl)

saveRDS(tuned_model,"model_tunig/models/m_r1_tuned.rds")

Results model tuning 1 receiver

m1_tuned<-readRDS("model_tuning/models/m_r1_tuned.rds")

print(m1_tuned)
## Random Forest 
## 
## 588346 samples
##      7 predictor
##      2 classes: 'active', 'passive' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 529510, 529512, 529511, 529511, 529511, 529512, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##   1     0.9591601  0.9183202
##   2     0.9616518  0.9233036
##   3     0.9619646  0.9239291
##   4     0.9618371  0.9236742
##   5     0.9615039  0.9230079
##   6     0.9610569  0.9221139
##   7     0.9602819  0.9205637
## 
## Tuning parameter 'splitrule' was held constant at a value of gini
## 
## Tuning parameter 'min.node.size' was held constant at a value of 10
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were mtry = 3, splitrule = gini
##  and min.node.size = 10.

Tuning mtry on the 2 receivers model

#reduce to ffs variables
predictors<-names(data_2[, c(m2$selectedvars, "Class")])
batsTune<-data_2[, predictors]

#tune number of variable evaluated per tree- number of trees is 1000
ctrl <- trainControl(## 10-fold CV
  method = "cv",
  number = 10,
  verboseIter = TRUE
)


tunegrid <- expand.grid(
  mtry = 1:(length(predictors)-1),                                  # mtry specified here
  splitrule = "gini"
  ,min.node.size = 10
)
tuned_model_2 <- train(Class~.,
                     data=batsTune,
                     method='rf',
                     metric='Kappa',
                     tuneGrid=tunegrid,
                     ntree=1000,
                     trControl=ctrl)
print(tuned_model_2)


saveRDS(tuned_model_2,"model_tuning/models/m_r2_tuned.rds")

Results model tuning 2 receivers

m2_tuned<-readRDS("model_tuning/models/m_r2_tuned.rds")
print(m2_tuned)
## Random Forest 
## 
## 220548 samples
##      8 predictor
##      2 classes: 'active', 'passive' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 198494, 198494, 198493, 198492, 198493, 198494, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##   1     0.9719608  0.9439215
##   2     0.9724187  0.9448374
##   3     0.9717975  0.9435951
##   4     0.9712988  0.9425976
##   5     0.9710041  0.9420081
##   6     0.9707139  0.9414277
##   7     0.9703285  0.9406569
##   8     0.9702605  0.9405209
## 
## Tuning parameter 'splitrule' was held constant at a value of gini
## 
## Tuning parameter 'min.node.size' was held constant at a value of 10
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were mtry = 2, splitrule = gini
##  and min.node.size = 10.

Results and discussion model training and tuning

Both models ( based on data from 1 receiver and 2 receivers) had very high performance metrics (Kappa, Accuracy) with slightly better results for the 2 receivers model.Tuning the mtry parameter did not increase the performance which indicates that for our use case default settings are a good choice.

Model evaluation

For the Validation of the model performance and applicability to species with different movement behaviour (speed etc. than bats) we generated three different data sets. + 1. We put 50% of our bat data aside + 2. We collected ground truth data of a tagged medium spotted woodpecker + 3. We simulated different movement intensities by humans carrying transmitters through the forest

In this section we will test how well the models perform in terms of different performance metrics such as F-Score, Accuracy, ROC-AUC

Bats

We first take a look at te 50% test data that has been put aside for evaluation. Here we actually perform the prediction using the two trained models. For the woodpecker and human walk data set we will use already predicted data that has been processed by script validation_woodpecker and validation_human_activity.

Test data collected by one Receiver

# Testdata 1 receiver
Test_1<-readRDS("validation/bats/data/batsTest_1_receiver.rds")
print(table(Test_1$Class))
## 
##  active passive 
##  294172  294172
# Default names as expected in Caret
Test_1$obs<-factor(Test_1$Class)

#get binary prediction
pred1<-predict(m1, Test_1)
Test_1$pred<-factor(pred1)

#probabilities
prob<-predict(m1, Test_1, type="prob")
Test_1<-cbind(Test_1, prob)
#calculate roc-auc 

roc1 <- MLeval::evalm(data.frame(prob, Test_1$obs))
saveRDS(roc1, "validation/bats/results/roc_1receiver.rds")

Performance metrics for 1-receiver test data of bats

#create confusion matrix
cm_r1<- confusionMatrix(factor(Test_1$pred), factor(Test_1$Class))
print(cm_r1)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active  282861   10220
##    passive  11311  283952
##                                           
##                Accuracy : 0.9634          
##                  95% CI : (0.9629, 0.9639)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9268          
##                                           
##  Mcnemar's Test P-Value : 1.099e-13       
##                                           
##             Sensitivity : 0.9615          
##             Specificity : 0.9653          
##          Pos Pred Value : 0.9651          
##          Neg Pred Value : 0.9617          
##              Prevalence : 0.5000          
##          Detection Rate : 0.4808          
##    Detection Prevalence : 0.4981          
##       Balanced Accuracy : 0.9634          
##                                           
##        'Positive' Class : active          
## 
print(cm_r1$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.9615497            0.9652584            0.9651291 
##       Neg Pred Value            Precision               Recall 
##            0.9616918            0.9651291            0.9615497 
##                   F1           Prevalence       Detection Rate 
##            0.9633361            0.5000000            0.4807749 
## Detection Prevalence    Balanced Accuracy 
##            0.4981456            0.9634041

ROC-AUC for 1-receiver test data of bats

#
twoClassSummary(Test_1, lev = levels(Test_1$obs))
##       ROC      Sens      Spec 
## 0.9942587 0.9615497 0.9652584
roc1 <- readRDS("validation/bats/results/roc_1receiver.rds")
print(roc1$roc)

Bats: Test data collected by two receivers

#two receivers
Test_2<-readRDS("validation/bats/data/batsTest_2_receivers.rds")

table(Test_2$Class)
## 
##  active passive 
##  110273  110273
Test_2$obs<-Test_2$Class
#get binary prediction
pred2<-predict(m2, Test_2)
Test_2$pred<-pred2
#probabilities
prob2<-predict(m2, Test_2, type="prob")
Test_2<-cbind(Test_2, prob2)
#calculate roc-auc
roc2 <- MLeval::evalm(data.frame(prob2, Test_2$obs))

saveRDS(roc2, "validation/bats/results/roc_2receivers.rds")

Performance metrics for 2-receiver test data of bats

cm_r2<- confusionMatrix(factor(Test_2$pred), factor(Test_2$obs))
print(cm_r2)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active  107568    2746
##    passive   2705  107527
##                                           
##                Accuracy : 0.9753          
##                  95% CI : (0.9746, 0.9759)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9506          
##                                           
##  Mcnemar's Test P-Value : 0.588           
##                                           
##             Sensitivity : 0.9755          
##             Specificity : 0.9751          
##          Pos Pred Value : 0.9751          
##          Neg Pred Value : 0.9755          
##              Prevalence : 0.5000          
##          Detection Rate : 0.4877          
##    Detection Prevalence : 0.5002          
##       Balanced Accuracy : 0.9753          
##                                           
##        'Positive' Class : active          
## 
print(cm_r2$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.9754700            0.9750982            0.9751074 
##       Neg Pred Value            Precision               Recall 
##            0.9754608            0.9751074            0.9754700 
##                   F1           Prevalence       Detection Rate 
##            0.9752887            0.5000000            0.4877350 
## Detection Prevalence    Balanced Accuracy 
##            0.5001859            0.9752841

ROC-AUC for 2-receiver test data of bats

#
twoClassSummary(data_2, lev = levels(data_2$obs))
roc2 <- readRDS("validation/bats/results/roc_2receivers.rds")
print(roc2$roc)

Woodpecker

#two receivers
wp<-readRDS("validation/woodpecker/data/woodpecker_groundtruth.rds")

wp$obs<-as.factor(wp$observed)
wp$pred<-as.factor(wp$prediction)

Performance metrics woodpecker

#create confusion matrix
cm_wp<- confusionMatrix(wp$pred, wp$obs)
print(cm_wp)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active    8309      31
##    passive    432    7969
##                                           
##                Accuracy : 0.9723          
##                  95% CI : (0.9697, 0.9748)
##     No Information Rate : 0.5221          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9447          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9506          
##             Specificity : 0.9961          
##          Pos Pred Value : 0.9963          
##          Neg Pred Value : 0.9486          
##              Prevalence : 0.5221          
##          Detection Rate : 0.4963          
##    Detection Prevalence : 0.4982          
##       Balanced Accuracy : 0.9734          
##                                           
##        'Positive' Class : active          
## 
print(cm_wp$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.9505777            0.9961250            0.9962830 
##       Neg Pred Value            Precision               Recall 
##            0.9485776            0.9962830            0.9505777 
##                   F1           Prevalence       Detection Rate 
##            0.9728939            0.5221313            0.4963264 
## Detection Prevalence    Balanced Accuracy 
##            0.4981781            0.9733514

ROC-AUC Woodpecker

print(twoClassSummary(wp, lev = levels(wp$obs)))
##       ROC      Sens      Spec 
## 0.9982197 0.9505777 0.9961250
roc_wp <- MLeval::evalm(data.frame(wp[, c("active", "passive")], wp$obs), plots=c("r"))

#print(roc_wp$roc)

Human activity

#two receivers
hm<-readRDS("validation/human/data/human_walk_groundtruth.rds")
hm$obs<-factor(hm$observation)
hm$pred<-factor(hm$prediction)

Performance metrics human activity

#create confusion matrix
cm_hm<- confusionMatrix(factor(hm$pred), factor(hm$obs))
print(cm_hm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active   25787     280
##    passive    717    5870
##                                           
##                Accuracy : 0.9695          
##                  95% CI : (0.9675, 0.9713)
##     No Information Rate : 0.8117          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9028          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9729          
##             Specificity : 0.9545          
##          Pos Pred Value : 0.9893          
##          Neg Pred Value : 0.8911          
##              Prevalence : 0.8117          
##          Detection Rate : 0.7897          
##    Detection Prevalence : 0.7983          
##       Balanced Accuracy : 0.9637          
##                                           
##        'Positive' Class : active          
## 
print(cm_hm$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.9729475            0.9544715            0.9892584 
##       Neg Pred Value            Precision               Recall 
##            0.8911492            0.9892584            0.9729475 
##                   F1           Prevalence       Detection Rate 
##            0.9810352            0.8116617            0.7897042 
## Detection Prevalence    Balanced Accuracy 
##            0.7982789            0.9637095

ROC-AUC human activity

twoClassSummary(hm, lev = levels(hm$obs))
##       ROC      Sens      Spec 
## 0.9902507 0.9729475 0.9544715
roc_hm <- MLeval::evalm(data.frame(hm[, c("active", "passive")], hm$obs),plots=c("r"))

#print(roc_hm$roc)

Results random-forest model validation

Regardless of whether the models were tested on independent test data from bats or on data from other species (human, woodpecker), the performance metrics were always close to their maxima.

Comparison to a threshold based approach

The results of the ML-based approach were compared with those of a threshold-based approach (Kays et al. 2011)by calculating the difference in the signal strength between successive signals for all three test datasets (bats, bird, humans). We applied a threshold of 2.5 dB which was deemed appropriate to optimally separate active and passive behaviours in previous studies . In addition, the optimize-function of the R-package stats (R Core Team, 2021) was used to identify the value of the signal strength difference that separated the training dataset into active and passive with the highest accuracy. This value was also applied to all three test datasets.

Bats

Threshold optimisation

To find the threshold value that optimizes the accuracy (data is balanced) when separating the data into active and passive, we first calculated the signal strength difference of consecutive signals in the complete bats data set, than separated 50 % balanced test and train data and finally used the optimize function from the R base package to determine the best threshold.

#get all bat data
trn<-fread("validation/bats/data/train_2020_2021.csv")

#calculate signal strength difference per station
dtrn<-plyr::ldply(unique(trn$station), function(x){
  
  tmp<-trn[trn$station==x,]
  tmp<-tmp[order(tmp$timestamp),]
  tmp<-tmp%>%group_by(ID)%>%
    mutate(Diff = abs(max_signal - lag(max_signal)))
  return(tmp)
  })

##data clean up
dtrn<-dtrn[!is.na(dtrn$Diff),]
dtrn<-dtrn[!(dtrn$behaviour=="active" & dtrn$Diff==0),]

##factorize
dtrn$behaviour<-as.factor(dtrn$behaviour)
table(dtrn$behaviour)
## 
##  active passive 
##  513831 2654868
#balance data
set.seed(10)

tdown<-downSample(x = dtrn,
                  y = dtrn$behaviour)

#create 50% train and test

trainIndex <-createDataPartition(tdown$Class, p = .5, 
                                 list = FALSE, 
                                 times = 1)

dtrn <- tdown[ trainIndex,]
dtst  <- tdown[-trainIndex,]

#optimize seperation value based on accuracy (remeber data is balanced)

value<-dtrn$Diff
group<-dtrn$behaviour

accuracy = Vectorize(function(th) mean(c("passive", "active")[(value > th) + 1] == group))
ac<-optimize(accuracy, c(min(value, na.rm=TRUE), max(value, na.rm=TRUE)), maximum=TRUE)

ac$maximum
## [1] 1.088167

Performance metrics based on optimized threshold

#classify data by optimized value
dtst$pred<-NA
dtst$pred[dtst$Diff>ac$maximum]<-"active"
dtst$pred[dtst$Diff<=ac$maximum]<-"passive"

#calc confusion matrix
dtst$pred<-factor(dtst$pred)
cm<-confusionMatrix(factor(dtst$Class), factor(dtst$pred))

print(cm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active  198976   57939
##    passive  81121  175794
##                                           
##                Accuracy : 0.7294          
##                  95% CI : (0.7281, 0.7306)
##     No Information Rate : 0.5451          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.4587          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.7104          
##             Specificity : 0.7521          
##          Pos Pred Value : 0.7745          
##          Neg Pred Value : 0.6842          
##              Prevalence : 0.5451          
##          Detection Rate : 0.3872          
##    Detection Prevalence : 0.5000          
##       Balanced Accuracy : 0.7312          
##                                           
##        'Positive' Class : active          
## 
print(cm$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.7103825            0.7521146            0.7744818 
##       Neg Pred Value            Precision               Recall 
##            0.6842497            0.7744818            0.7103825 
##                   F1           Prevalence       Detection Rate 
##            0.7410486            0.5451161            0.3872409 
## Detection Prevalence    Balanced Accuracy 
##            0.5000000            0.7312485

Performance metrics based on 2.5 dB threshold from the literature

#2.5 dB value from the literature
dtst$pred<-NA
dtst$pred[dtst$Diff>2.5]<-"active"
dtst$pred[dtst$Diff<=2.5]<-"passive"

dtst$pred<-factor(dtst$pred)
cm<-confusionMatrix(dtst$Class, dtst$pred)
print(cm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active  143475  113440
##    passive  48093  208822
##                                           
##                Accuracy : 0.6856          
##                  95% CI : (0.6844, 0.6869)
##     No Information Rate : 0.6272          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.3713          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.7490          
##             Specificity : 0.6480          
##          Pos Pred Value : 0.5585          
##          Neg Pred Value : 0.8128          
##              Prevalence : 0.3728          
##          Detection Rate : 0.2792          
##    Detection Prevalence : 0.5000          
##       Balanced Accuracy : 0.6985          
##                                           
##        'Positive' Class : active          
## 
print(cm$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.7489508            0.6479883            0.5584532 
##       Neg Pred Value            Precision               Recall 
##            0.8128058            0.5584532            0.7489508 
##                   F1           Prevalence       Detection Rate 
##            0.6398236            0.3728237            0.2792266 
## Detection Prevalence    Balanced Accuracy 
##            0.5000000            0.6984695

Woodpecker

Since activity observations are not continuous but signal recording on the tRackIT-Stations is, we first have to calculate the signal strength difference on the raw data and than match it to the ground truth observations

#list raw signals
wp<-list.files("validation/woodpecker/data/raw/", full.names = TRUE)


#calculate signal strength difference
wp_tst<-plyr::ldply(wp, function(x){
  
  tmp<-fread(x)
  tmp<-tmp[order(tmp$timestamp),]
  tmp<-tmp%>%mutate(Diff = abs(max_signal - lag(max_signal)))
  return(tmp)
})

wp_tst$timestamp<-lubridate::with_tz(wp_tst$timestamp, "CET")

#get observations and merge by timestamp

wp_gtruth<-readRDS("validation/woodpecker/data/woodpecker_groundtruth.rds")

wp_tst<-merge(wp_gtruth, wp_tst, all.x = TRUE)

Performance metrics based on optimized threshold

wp_tst$pred<-NA
wp_tst$pred[wp_tst$Diff>ac$maximum]<-"active"
wp_tst$pred[wp_tst$Diff<=ac$maximum]<-"passive"

wp_tst$pred<-factor(wp_tst$pred)
wp_tst$observed<-factor(wp_tst$observed)

cm<-confusionMatrix(factor(wp_tst$observed), factor(wp_tst$pred))

print(cm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active    8191    3822
##    passive    590    7692
##                                           
##                Accuracy : 0.7826          
##                  95% CI : (0.7769, 0.7883)
##     No Information Rate : 0.5673          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.5757          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9328          
##             Specificity : 0.6681          
##          Pos Pred Value : 0.6818          
##          Neg Pred Value : 0.9288          
##              Prevalence : 0.4327          
##          Detection Rate : 0.4036          
##    Detection Prevalence : 0.5919          
##       Balanced Accuracy : 0.8004          
##                                           
##        'Positive' Class : active          
## 
print(cm$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.9328095            0.6680563            0.6818447 
##       Neg Pred Value            Precision               Recall 
##            0.9287612            0.6818447            0.9328095 
##                   F1           Prevalence       Detection Rate 
##            0.7878234            0.4326681            0.4035969 
## Detection Prevalence    Balanced Accuracy 
##            0.5919192            0.8004329

Performance metrics based on 2.5 dB threshold from the literature

#evaluate with 2.5 dB value from the literature
wp_tst$pred<-NA
wp_tst$pred[wp_tst$Diff>2.5]<-"active"
wp_tst$pred[wp_tst$Diff<=2.5]<-"passive"

wp_tst$pred<-factor(wp_tst$pred)
cm<-confusionMatrix(wp_tst$observed, wp_tst$pred)
print(cm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active    5499    6514
##    passive    284    7998
##                                           
##                Accuracy : 0.665           
##                  95% CI : (0.6585, 0.6715)
##     No Information Rate : 0.7151          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3792          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.9509          
##             Specificity : 0.5511          
##          Pos Pred Value : 0.4578          
##          Neg Pred Value : 0.9657          
##              Prevalence : 0.2849          
##          Detection Rate : 0.2710          
##    Detection Prevalence : 0.5919          
##       Balanced Accuracy : 0.7510          
##                                           
##        'Positive' Class : active          
## 
print(cm$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.9508905            0.5511301            0.4577541 
##       Neg Pred Value            Precision               Recall 
##            0.9657088            0.4577541            0.9508905 
##                   F1           Prevalence       Detection Rate 
##            0.6180040            0.2849470            0.2709534 
## Detection Prevalence    Balanced Accuracy 
##            0.5919192            0.7510103

Humans

Human activity observations are also not continuous so we have to calc signal strength diff for each individual on the raw data

hm_dirs<-list.dirs("validation/human/data/", full.names = TRUE)
hm_dirs<-hm_dirs[grep("raw", hm_dirs)]
hm_tst<-plyr::ldply(hm_dirs, function(d){
  
  fls<-list.files(d, full.names = TRUE)
  
  tmp_dat<-plyr::ldply(fls, function(x){
  
  tmp<-fread(x)
  tmp<-tmp[order(tmp$timestamp),]
  tmp<-tmp%>%mutate(Diff = abs(max_signal - lag(max_signal)))
  return(tmp)
})
  
  return(tmp_dat)})

#get obesrvations and merge
hm_gtruth<-readRDS("validation/human/data/human_walk_groundtruth.rds")
hm_tst<-merge(hm_gtruth, hm_tst, all.x = TRUE)
hm_tst<-hm_tst[!duplicated(hm_tst$timestamp),]

Performance metrics based on optimized threshold

#evaluate based on optimized threshold
hm_tst$pred<-NA
hm_tst$pred[hm_tst$Diff>ac$maximum]<-"active"
hm_tst$pred[hm_tst$Diff<=ac$maximum]<-"passive"

hm_tst$pred<-factor(hm_tst$pred)
hm_tst$observed<-factor(hm_tst$observation)

cm<-confusionMatrix(hm_tst$observed, hm_tst$pred)

print(cm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active    9613    2292
##    passive    143    2030
##                                           
##                Accuracy : 0.827           
##                  95% CI : (0.8207, 0.8333)
##     No Information Rate : 0.693           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.5282          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9853          
##             Specificity : 0.4697          
##          Pos Pred Value : 0.8075          
##          Neg Pred Value : 0.9342          
##              Prevalence : 0.6930          
##          Detection Rate : 0.6828          
##    Detection Prevalence : 0.8456          
##       Balanced Accuracy : 0.7275          
##                                           
##        'Positive' Class : active          
## 
print(cm$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.9853424            0.4696900            0.8074759 
##       Neg Pred Value            Precision               Recall 
##            0.9341924            0.8074759            0.9853424 
##                   F1           Prevalence       Detection Rate 
##            0.8875860            0.6929962            0.6828385 
## Detection Prevalence    Balanced Accuracy 
##            0.8456457            0.7275162
#print(cm$table)

Performance metrics based on 2.5 dB threshold from the literature

#evaluate based on 2.5 dB value from the literature 

hm_tst$pred<-NA
hm_tst$pred[hm_tst$Diff>2.5]<-"active"
hm_tst$pred[hm_tst$Diff<=2.5]<-"passive"

hm_tst$pred<-factor(hm_tst$pred)
cm<-confusionMatrix(hm_tst$observed, hm_tst$pred)

print(cm)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction active passive
##    active    7036    4869
##    passive     29    2144
##                                         
##                Accuracy : 0.6521        
##                  95% CI : (0.6441, 0.66)
##     No Information Rate : 0.5018        
##     P-Value [Acc > NIR] : < 2.2e-16     
##                                         
##                   Kappa : 0.3024        
##                                         
##  Mcnemar's Test P-Value : < 2.2e-16     
##                                         
##             Sensitivity : 0.9959        
##             Specificity : 0.3057        
##          Pos Pred Value : 0.5910        
##          Neg Pred Value : 0.9867        
##              Prevalence : 0.5018        
##          Detection Rate : 0.4998        
##    Detection Prevalence : 0.8456        
##       Balanced Accuracy : 0.6508        
##                                         
##        'Positive' Class : active        
## 
print(cm$byClass)
##          Sensitivity          Specificity       Pos Pred Value 
##            0.9958953            0.3057180            0.5910122 
##       Neg Pred Value            Precision               Recall 
##            0.9866544            0.5910122            0.9958953 
##                   F1           Prevalence       Detection Rate 
##            0.7418028            0.5018469            0.4997869 
## Detection Prevalence    Balanced Accuracy 
##            0.8456457            0.6508066
print(cm$table)
##           Reference
## Prediction active passive
##    active    7036    4869
##    passive     29    2144

Comparison of the threshold based approach and the random-forest Model

When calibrating the threshold based approach on an adequate train data set,ii is generally able to separate active and passive behavior but performance metrics (F1=0.74, 0.78, 0.89; bats, woodpecker, human) are between 10 and 20 points worth and more variable than our random forest model (F1= 0.97, 0.97, 0.98; bats,woodpecker,human). With F-scores between 0.6 and 0.74 the threshold value proposed in the literature performed significantly worth.

Since only the test data set of the bats is balanced but the woodpecker data is slightly imbalanced and the human activity data set is highly imbalanced lets also take a look at a metric that takes the data distribution into account:

Cohen’s kappa is defined as:

K=(p_0-p_e)/(1-p_e)

where p_0 is the overall accuracy of the model and p_e is the measure of the agreement between the model predictions and the actual class values as if happening by chance.

Cohen’s kappa is always less than or equal to 1. Values of 0 or less, indicate that the classifier is not better than chance. Landis and Koch (1977) provide a way to characterize values. According to their scheme a value < 0 is indicating no agreement , 0–0.20 slight agrement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 substantial agreement , and 0.81–1 as almost perfect agreement.

Kappa values based on the 2.5 dB separation value from the literature ranged between 0.3 (humans) and 0.38 (woodpecker), i.e. a fair agreement. For the optimized threshold Kappa values were significantly better in all cases (0.46, 0.58, 0.53; bats, woodpecker, humans); i.e. moderate agreement. However, even the best Kappa value for the threshold based approach only showed a moderate agreement while all Kappa values based on the random-forest model showed an almost perfect agreement ( 0.94, 0.94, 0.90 ; bats, woodpecker, humans ).

REFERENCES

Alade, Michael Olusope. 2013. “Investigation of the Effect of Ground and Air Temperature on Very High Frequency Radio Signals.” Journal of Information Engineering and Applications 3: 16–21.
Christin, Sylvain, Éric Hervet, and Nicolas Lecomte. 2019. “Applications for Deep Learning in Ecology.” Methods Ecol. Evol. 10 (10): 1632–44.
Cochran, W W, D W Warner, J R Tester, and V B Kuechle. 1965. “Automatic Radio-Tracking System for Monitoring Animal Movements.” Bioscience 15 (2): 98–100.
Fernández-Delgado, Manuel, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. “Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?” J. Mach. Learn. Res. 15 (1): 3133–81.
Kays, Roland, Sameer Tilak, Margaret Crofoot, Tony Fountain, Daniel Obando, Alejandro Ortega, Franz Kuemmeth, et al. 2011. “Tracking Animal Location and Activity with an Automated Radio Telemetry System in a Tropical Rainforest.” Comput. J. 54 (12): 1931–48.
Kjos, Charles G, and William W Cochran. 1970. “Activity of Migrant Thrushes as Determined by Radio-Telemetry.” Wilson Bull., 225–26.
Landis, J R, and G G Koch. 1977. “An Application of Hierarchical Kappa-Type Statistics in the Assessment of Majority Agreement Among Multiple Observers.” Biometrics 33 (2): 363–74.
Meyer, Hanna, Christoph Reudenbach, Tomislav Hengl, Marwan Katurji, and Thomas Nauss. 2018. “Improving Performance of Spatio-Temporal Machine Learning Models Using Forward Feature Selection and Target-Oriented Validation.” Environmental Modelling & Software 101 (March): 1–9.
Probst, Philipp, Marvin N Wright, and Anne-Laure Boulesteix. 2019. “Hyperparameters and Tuning Strategies for Random Forest.” Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9 (3): e1301.
Shwartz-Ziv, Ravid, and Amitai Armon. 2022. “Tabular Data: Deep Learning Is Not All You Need.” Inf. Fusion 81 (May): 84–90.