Data-centric AI approach for automated wildflower monitoring

Gerard Schouten,Bas S.H.T. Michielsen,Barbara Gravendeel
DOI: https://doi.org/10.1101/2024.04.18.590040
2024-04-22
Abstract:Both researchers and policy makers are in need of standards and tools that help understanding and assessing natural capital. Wildflowers are a major component of our natural capital; they play an essential role in ecosystems, improve soil health, supply food and medicines, and curb climate change. In this paper, we present the Eindhoven Wildflower Dataset (EWD) as well as a PyTorch object detection model that is able to and wildflowers. EWD, collected over two entire flowering seasons and expert annotated, contains 2002 top-view images of flowering plants captured ‘in the wild’ in five different landscape types (roadsides, urban green spaces, cropland, weed-rich grassland, marshland). It holds a total of 65571 annotations for 160 species belonging to 31 different families of flowering plants and serves as a reference dataset for automating wildflower monitoring. To ensure consistent annotations, we define specific floral count units (largely based on inflorescences) and provide extensive annotation guidelines. With a 0.82 mAP (@IoU > 0.50) score the presented baseline model, trained with a balanced subset of EWD, is to the best of our knowledge superior in its class. Our approach empowers automated quantification of wildflower richness and abundance and encourages the development of standards for AI-based wildflower monitoring. The annotated EWD dataset is publicly available on the DataverseNL research data repository, and the code to train and run the baseline model is supplied as supplementary material.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop a reliable method to automatically identify and count multiple wild flowers from top - view images using artificial intelligence (AI) technology. Specifically, the author aims to create an automated monitoring system that can handle wild flowers with complex backgrounds, different growth stages, and small inter - class differences but large intra - class differences. This will help to more effectively evaluate and protect wild flower resources in natural capital and support the sustainable services of ecosystems. ### Specific Background and Challenges of the Problem 1. **Limitations of Existing Methods**: - Current wild flower monitoring relies on manual counting, which is time - consuming, labor - intensive and error - prone. - Existing AI tools (such as Pl@ntNet or PlantSnap) are mainly used to classify close - up photos of single flowers and cannot handle overview images containing multiple flowers. - Datasets are usually concentrated on a few "iconic" species, resulting in data bias. 2. **Computer Vision Challenges**: - Cluttered background: The background of wild flowers can be very complex, increasing the detection difficulty. - Flower morphological changes: The process of flowers from buds to full bloom and then to fruiting will lead to changes in their size, shape and color. - Class imbalance: There are a large number of wild flower species, but the number of some classes is small, forming an imbalance problem between the majority class and the minority class. ### Solutions To solve the above problems, the author has taken the following measures: 1. **Create a High - Quality Dataset**: - **Eindhoven Wildflower Dataset (EWD)**: This is an expert - annotated dataset containing 2,002 high - resolution top - view images, covering 160 wild flower species belonging to 31 different plant families. These images were collected from five different landscape types (roadside, urban green space, farmland, grassland rich in weeds and wetland) around Eindhoven, the Netherlands. - **Strict Annotation Guidelines**: To ensure the consistency and accuracy of annotations, the author defined specific flower units (mainly based on inflorescences) and provided detailed annotation guidelines. 2. **Develop an Object Detection Model**: - An object detection model was trained using the PyTorch framework, which can identify and count multiple wild flowers in overview images. - The model was trained on a balanced subset and the resolution was maintained by splitting the images. - The performance of the model reached a score of 0.82 mAP (@IoU > 0.50), outperforming existing similar models. ### Summary The main contribution of this study lies in providing a data - driven AI method for automated wild flower monitoring. By creating a high - quality dataset and developing an efficient object detection model, the author provides new tools and technical support for large - scale and reliable wild flower monitoring. This result is not only helpful for scientific research, but also of great significance for ecological protection and policy - making.