Robust ecological analysis of camera trap data labelled by a machine learning model
Robin C. Whytock,Jędrzej Świeżewski,Joeri A. Zwerts,Tadeusz Bara‐Słupski,Aurélie Flore Koumba Pambo,Marek Rogala,Laila Bahaa‐el‐din,Kelly Boekee,Stephanie Brittain,Anabelle W. Cardoso,Philipp Henschel,David Lehmann,Brice Momboua,Cisquet Kiebou Opepa,Christopher Orbell,Ross T. Pitman,Hugh S. Robinson,Katharine A. Abernethy
DOI: https://doi.org/10.1111/2041-210X.13576
2021-03-05
Methods in Ecology and Evolution
Abstract:1. Ecological data are collected over vast geographic areas using digital sensors such as camera traps and bioacoustic recorders. Camera traps have become the standard method for surveying many terrestrial mammals and birds, but camera trap arrays often generate millions of images that are time‐consuming to label. This causes significant latency between data collection and subsequent inference, which impedes conservation at a time of ecological crisis. Machine learning algorithms have been developed to improve the speed of labeling camera trap data, but it is uncertain how the outputs of these models can be used in ecological analyses without secondary validation by a human. 2. Here, we present our approach to developing, testing and applying a machine learning model to camera trap data for the purpose of achieving fully automated ecological analyses. As a case‐study, we built a model to classify 26 Central African forest mammal and bird species (or groups). The model generalizes to new spatially and temporally independent data ( n = 227 camera stations, n = 23868 images), and outperforms humans in several respects (e.g. detecting 'invisible' animals). We demonstrate how ecologists can evaluate a machine learning model's precision and accuracy in an ecological context by comparing species richness, activity patterns ( n = 4 species tested) and occupancy ( n = 4 species tested) derived from machine learning labels with the same estimates derived from expert labels. 3. Results show that fully automated species labels can be equivalent to expert labels when calculating species richness, activity patterns ( n = 4 species tested) and estimating occupancy ( n = 3 of 4 species tested) in a large, completely out‐of‐sample test dataset. Simple thresholding using the Softmax values (i.e. excluding 'uncertain' labels) improved the model's performance when calculating activity patterns and estimating occupancy but did not improve estimates of species richness. 4. We conclude that, with adequate testing and evaluation in an ecological context, a machine learning model can generate labels for direct use in ecological analyses without the need for manual validation. We provide the user‐community with a multi‐platform, multi‐language graphical user interface that can be used to run our model offline.