Deep learning algorithm outperforms experienced human observer at detection of blue whale D‐calls: a double‐observer analysis

Brian S. Miller,Shyam Madhusudhana,Meghan G. Aulich,Nat Kelly
DOI: https://doi.org/10.1002/rse2.297
IF: 5.7874
2022-08-30
Remote Sensing in Ecology and Conservation
Abstract:We describe an important advance in methods for characterising automated detectors of bioacoustic data from passive acoustic monitoring stations. In addition to describing a state‐of‐the‐art automated detector based on deep‐learning, our advance comes from applying double‐observer statistical methods, commonly used in visual surveys, to compare our deep‐learning detector directly to a human analyst. To the best of our knowledge, this is the first time that an automated bioacoustics detector has been directly compared to a human observer. We also quantify, via our case study on critically endangered Antarctic blue whales, the manner in which our modern AI detector of whale sounds was superior. An automated algorithm for passive acoustic detection of blue whale D‐calls was developed based on established deep learning methods for image recognition via the DenseNet architecture. The detector was trained on annotated acoustic recordings from the Antarctic, and performance of the detector was assessed by calculating precision and recall using a separate independent dataset also from the Antarctic. Detections from both the human analyst and automated detector were then inspected by an independent judge to identify any calls missed by either approach and to adjudicate whether the apparent false‐positive detections from the automated approach were actually true positives. A final performance assessment was conducted using double‐observer methods (via a closed‐population Huggins mark–recapture model) to assess the probability of detection of calls by both the human analyst and automated detector, based on the assumption of false‐positive‐free adjudicated detections. According to our double‐observer analysis, the automated detector showed superior performance with higher recall and fewer false positives than the original human analyst, and with performance similar to existing top automated detectors. To understand the performance of both detectors we inspected the time‐series and signal‐to‐noise ratio (SNR) of detections for the test dataset, and found that most of the advantages from the automated detector occurred at low and medium SNR.
ecology,remote sensing
What problem does this paper attempt to address?