Enhancing predictive performance for spectroscopic studies in wildlife science through a multi-model approach: A case study for species classification of live amphibians

Li-Dunn Chen,Michael A. Caprio,Devin M. Chen,Andy J. Kouba,Carrie K. Kouba
DOI: https://doi.org/10.1371/journal.pcbi.1011876
2024-02-15
PLoS Computational Biology
Abstract:Near infrared spectroscopy coupled with predictive modeling is a growing field of study for addressing questions in wildlife science aimed at improving management strategies and conservation outcomes for managed and threatened fauna. To date, the majority of spectroscopic studies in wildlife and fisheries applied chemometrics and predictive modeling with a single-algorithm approach. By contrast, multi-model approaches are used routinely for analyzing spectroscopic datasets across many major industries (e.g., medicine, agriculture) to maximize predictive outcomes for real-world applications. In this study, we conducted a benchmark modeling exercise to compare the performance of several machine learning algorithms in a multi-class problem utilizing a multivariate spectroscopic dataset obtained from live animals. Spectra obtained from live individuals representing eleven amphibian species were classified according to taxonomic designation. Seven modeling techniques were applied to generate prediction models, which varied significantly (p < 0.05) with regard to mean classification accuracy (e.g., support vector machine: 95.8 ± 0.8% vs. K-nearest neighbors: 89.3 ± 1.0%). Through the use of a multi-algorithm approach, candidate algorithms can be identified and applied to more effectively model complex spectroscopic data collected for wildlife sciences. Other key considerations in the predictive modeling workflow that serve to optimize spectroscopic model performance (e.g., variable selection and cross-validation procedures) are also discussed. We explored the use of a multi-model approach for analyzing a spectroscopic dataset relevant to wildlife sciences. To date, the vast majority of spectroscopy studies for wildlife sciences have assessed spectral datasets with a single algorithm approach, but comparing multiple classification models may yield higher predictive outcomes for the given question and study system. This study utilized near-infrared spectra collected from the dermal surface of live amphibians across eleven species. We first demonstrate that NIR spectra collected from live amphibians produces unique spectral signatures characteristic of each species. Next, we evaluated the classification accuracy of seven machine learning algorithms for NIR spectra of unknown origin. Significant differences in performance accuracy were observed across model algorithms. With NIR data already digitized, a multi-model analysis may enhance predictive performance of existing or newly generated predictive models, which could lead to more robust as well as more informed decision-making regarding wildlife management practices based on spectroscopic data.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?