Improved detection of discarded fish species through BoxAL active learning

Maria Sokolova,Pieter M. Blok,Angelo Mencarelli,Arjan Vroegop,Aloysius van Helmond,Gert Kootstra
2024-10-07
Abstract:In recent years, powerful data-driven deep-learning techniques have been developed and applied for automated catch registration. However, these methods are dependent on the labelled data, which is time-consuming, labour-intensive, expensive to collect and need expert knowledge. In this study, we present an active learning technique, named BoxAL, which includes estimation of epistemic certainty of the Faster R-CNN object-detection model. The method allows selecting the most uncertain training images from an unlabeled pool, which are then used to train the object-detection model. To evaluate the method, we used an open-source image dataset obtained with a dedicated image-acquisition system developed for commercial trawlers targeting demersal species. We demonstrated, that our approach allows reaching the same object-detection performance as with the random sampling using 400 fewer labelled images. Besides, mean AP score was significantly higher at the last training iteration with 1100 training images, specifically, 39.0&plusmn;1.6 and 34.8&plusmn;1.8 for certainty-based sampling and random sampling, respectively. Additionally, we showed that epistemic certainty is a suitable method to sample images that the current iteration of the model cannot deal with yet. Our study additionally showed that the sampled new data is more valuable for training than the remaining unlabeled data. Our software is available on <a class="link-external link-https" href="https://github.com/pieterblok/boxal" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the challenges faced by automatic detection of discarded fish species in fishery monitoring, especially during bottom trawling. Specifically, the author proposes an active learning - based method, BoxAL, to reduce the need for labeled training data, thereby improving the efficiency and accuracy of fish detection. #### Background problems 1. **High cost of obtaining labeled data**: Traditional deep - learning methods rely on a large amount of labeled data, and the acquisition of these data is time - consuming, expensive, and requires expert knowledge. 2. **Complexity of discarded fish species monitoring**: There are a large number of variations and occlusions in the images of discarded fish species, which increases the difficulty of model training. Different species of fish have different appearances in the images, which are affected by factors such as natural variation, age, and seasonal changes. In addition, the position of fish on the conveyor belt will also affect the detection effect. 3. **Limitations of existing methods**: Traditional manual observation and random sampling methods cannot efficiently process large - scale discarded fish species data and are difficult to achieve comprehensive coverage. #### Research objectives 1. **Reduce the need for labeled data**: Select the most informative unlabeled images for labeling through the active learning method, thereby reducing the amount of required labeled data. 2. **Improve detection performance**: Verify whether the proposed BoxAL method can achieve the same detection performance as the random sampling method when using less labeled data. 3. **Explore the effectiveness of uncertainty estimation**: Study the effectiveness of model uncertainty (epistemic certainty) in selecting training samples, especially how to select those images that the current model cannot handle correctly. #### Main contributions - Proposed the BoxAL method, which combines the Faster R - CNN object detection model and the Monte Carlo dropout technique to estimate the uncertainty of the model. - Verified through experiments that the BoxAL method can significantly improve the performance of fish detection while reducing the need for labeled data. - Found that the uncertainty - based active learning method can more effectively select training samples that are helpful for model improvement. ### Summary This paper solves the problems of high data - labeling cost and image complexity in the automatic detection of discarded fish species by introducing the BoxAL active learning method, providing a more efficient and economical solution for fishery monitoring.