LRSAA: Large-scale Remote Sensing Image Target Recognition and Automatic Annotation

Yujuan Zhu,Wuzheng Dong
2024-11-24
Abstract:This paper presents a method for object recognition and automatic labeling in large-area remote sensing images called LRSAA. The method integrates YOLOv11 and MobileNetV3-SSD object detection algorithms through ensemble learning to enhance model performance. Furthermore, it employs Poisson disk sampling segmentation techniques and the EIOU metric to optimize the training and inference processes of segmented images, followed by the integration of results. This approach not only reduces the demand for computational resources but also achieves a good balance between accuracy and speed. The source code for this project has been made publicly available on <a class="link-external link-https" href="https://github.com/anaerovane/LRSAA" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges of object recognition and automatic annotation in large - scale remote sensing images. Specifically, the paper aims to: 1. **Improve model performance**: By integrating two advanced object detection algorithms, YOLOv11 and MobileNetV3 - SSD, and using the ensemble learning method to enhance the robustness and accuracy of the model. 2. **Optimize the training and inference process**: Introduce the Poisson - disk sampling segmentation technique and EIOU (Enhanced Intersection over Union) metric to optimize the training and inference process of segmented images, thereby improving the processing speed while maintaining high precision. 3. **Reduce the demand for computing resources**: Through efficient model selection and optimization techniques, reduce the demand for computing resources, making the processing of large - scale remote sensing images more efficient. 4. **Address the challenges of large - scale image processing**: In view of the characteristics of large - scale remote sensing images, propose a method that can effectively process large - area images, ensuring good recognition results even under complex conditions. ### Specific problems and solutions #### 1. Model selection and integration Most of the existing object recognition in remote sensing images relies on a single model, which may lead to insufficient generalization ability and robustness of the model. For this reason, the paper proposes to use two models, YOLOv11 and MobileNetV3 - SSD, and combine them through the ensemble learning method to improve the overall performance. #### 2. Dataset partitioning and processing Large - scale remote sensing images are usually large in size, and direct processing will bring a huge computational burden. The paper adopts the Poisson - disk sampling segmentation technique to divide large images into multiple small images for processing, and then map the results back to the original image. This method not only improves the processing efficiency but also ensures spatial consistency. #### 3. Balance between precision and speed In order to improve the processing speed while ensuring precision, the paper introduces the EIOU metric to improve the non - maximum suppression (NMS) process. The EIOU loss function is defined as follows: \[ EIoU = 1 - IoU+\rho^{2}+v \] where, \[ IoU=\frac{\text{Area of Overlap}}{\text{Area of Union}} \] \[ \rho^{2} \text{ represents the center point distance loss} \] \[ v \text{ represents the aspect ratio loss} \] In this way, the model can obtain more accurate bounding box predictions in a shorter time. #### 4. Application of synthetic data To further improve the generalization ability of the model, the paper introduces synthetic data. Synthetic data is generated by randomly sampling real remote sensing images, and it is added to the original dataset at a certain proportion for retraining, thereby enhancing the adaptability and robustness of the model. ### Experimental verification The paper uses the XView dataset for preliminary training and verifies it on urban remote sensing images in Tianjin, Shanghai, and Xiamen. The experimental results show that the LRSAA model is superior to other models in multiple evaluation metrics, especially when processing small - size images of 640×640 and 320×320. In conclusion, through innovative technical means, this paper solves a series of key problems in object recognition and automatic annotation of large - scale remote sensing images, providing more efficient and accurate support for the application of geographic information systems.