Enhancing two-stage object detection models via data-driven anchor box optimization in UAV-based maritime SAR

Beigeng Zhao,Rui Song
DOI: https://doi.org/10.1038/s41598-024-55570-z
IF: 4.6
2024-02-28
Scientific Reports
Abstract:The high-altitude imaging capabilities of Unmanned Aerial Vehicles (UAVs) offer an effective solution for maritime Search and Rescue (SAR) operations. In such missions, the accurate identification of boats, personnel, and objects within images is crucial. While object detection models trained on general image datasets can be directly applied to these tasks, their effectiveness is limited due to the unique challenges posed by the specific characteristics of maritime SAR scenarios. Addressing this challenge, our study leverages the large-scale benchmark dataset SeaDronesSee, specific to UAV-based maritime SAR, to analyze and explore the unique attributes of image data in this scenario. We identify the need for optimization in detecting specific categories of difficult-to-detect objects within this context. Building on this, an anchor box optimization strategy is proposed based on clustering analysis, aimed at enhancing the performance of the renowned two-stage object detection models in this specialized task. Experiments were conducted to validate the proposed anchor box optimization method and to explore the underlying reasons for its effectiveness. The experimental results show our optimization method achieved a 45.8% and a 10% increase in average precision over the default anchor box configurations of torchvision and the SeaDronesSee official sample code configuration respectively. This enhancement was particularly evident in the model's significantly improved ability to detect swimmers, floaters, and life jackets on boats within the SeaDronesSee dataset's SAR scenarios. The methods and findings of this study are anticipated to provide the UAV-based maritime SAR research community with valuable insights into data characteristics and model optimization, offering a meaningful reference for future research.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the accuracy of the two - stage object detection model in unmanned aerial vehicle (UAV) - based maritime search and rescue (SAR) missions. Specifically, the paper focuses on how to optimize anchor boxes through data - driven methods in specific maritime SAR scenarios to enhance the detection ability for specific categories (such as swimmers, floating objects, and life jackets). Since the sizes of these specific - category objects in maritime SAR images vary greatly, and there are problems of class imbalance and overlapping labels, traditional general - purpose object detection models are difficult to effectively cope with these challenges. Therefore, the paper proposes an anchor box optimization strategy based on cluster analysis, aiming to improve the performance of the two - stage object detection model in this special task. ### Main Research Questions 1. **RQ1**: Considering the overlapping nature of objects such as ships, swimmers on ships, floating objects on ships, and life jackets in the image, as well as the significant scarcity of the number of life jacket labels, are these categories really difficult to identify? 2. **RQ2**: In UAV - based maritime SAR tasks, can the anchor box optimization strategy further improve the recognition accuracy of the two - stage model? 3. **RQ3**: If different anchor box optimization strategies can improve the overall model accuracy, what are the specific manifestations of these improvements? ### Method Overview To answer the above research questions, the paper adopts the following methods: - **Data Analysis**: Analyze the data characteristics of different object categories using the SeaDronesSee dataset, including the number of objects, area distribution, etc. - **Model Selection**: Select a two - stage object detection model based on the Faster R - CNN framework, and combine different backbone networks and configurations, including whether to use the Feature Pyramid Network (FPN). - **Anchor Box Optimization Strategy**: Compare four different anchor box optimization strategies, including the default configuration, the configuration recommended by the official example code, IoU - based cluster optimization, and k - means - based cluster optimization. - **Experimental Setup and Validation Criteria**: Use PyTorch and torchvision to build models, and train and evaluate them on RTX 4090 GPU. The model performance is evaluated by COCO evaluation metrics (average precision and recall). ### Experimental Results The experimental results show that the anchor box optimization strategy based on cluster analysis significantly improves the performance of the model with FPN in maritime SAR tasks. In particular, the optimized anchor box configuration performs particularly well in detecting difficult - to - detect objects such as swimmers, floating objects, and life jackets. Compared with the default configuration and the configuration recommended by the official example code, the average precision is increased by 45.8% and 10% respectively. ### Conclusions Through the data - driven anchor box optimization strategy, the paper successfully improves the detection accuracy of the two - stage object detection model in UAV - based maritime SAR tasks, especially showing significant advantages in handling specific - category objects. These methods and findings provide valuable references for future related research.