Evaluating the method reproducibility of deep learning models in the biodiversity domain

Waqas Ahmed,Vamsi Krishna Kommineni,Birgitta König-Ries,Jitendra Gaikwad,Luiz Gadelha,Sheeba Samuel
2024-07-10
Abstract:Artificial Intelligence (AI) is revolutionizing biodiversity research by enabling advanced data analysis, species identification, and habitats monitoring, thereby enhancing conservation efforts. Ensuring reproducibility in AI-driven biodiversity research is crucial for fostering transparency, verifying results, and promoting the credibility of ecological findings.This study investigates the reproducibility of deep learning (DL) methods within the biodiversity domain. We design a methodology for evaluating the reproducibility of biodiversity-related publications that employ DL techniques across three stages. We define ten variables essential for method reproducibility, divided into four categories: resource requirements, methodological information, uncontrolled randomness, and statistical considerations. These categories subsequently serve as the basis for defining different levels of reproducibility. We manually extract the availability of these variables from a curated dataset comprising 61 publications identified using the keywords provided by biodiversity experts. Our study shows that the dataset is shared in 47% of the publications; however, a significant number of the publications lack comprehensive information on deep learning methods, including details regarding randomness.
Information Retrieval
What problem does this paper attempt to address?
The paper mainly discusses the reproducibility issues of deep learning models in biodiversity research. With the widespread application of deep learning in the field of ecology, ensuring research reproducibility is crucial for enhancing transparency, validating results, and improving the credibility of ecological discoveries. The study analyzed 61 biodiversity-related publications that utilized deep learning techniques by designing an evaluation method, focusing on 10 key variables including data sharing, method details, randomness control, and statistical considerations. It was found that although around 47% of the publications shared datasets, many of them lacked comprehensive information on deep learning methods, especially regarding details about randomness. The paper points out that the reproducibility of deep learning methods in biodiversity research is generally low, but there is a gradual improvement trend. The contribution of the paper lies in raising awareness of the current state of reproducibility of deep learning methods in this important field and providing a basis for enhancing the credibility and impact of these methods.