OBBStacking: An Ensemble Method for Remote Sensing Object Detection

Haoning Lin,Changhao Sun,Yunpeng Liu
DOI: https://doi.org/10.48550/arXiv.2209.13369
2022-09-27
Abstract:Ensemble methods are a reliable way to combine several models to achieve superior performance. However, research on the application of ensemble methods in the remote sensing object detection scenario is mostly overlooked. Two problems arise. First, one unique characteristic of remote sensing object detection is the Oriented Bounding Boxes (OBB) of the objects and the fusion of multiple OBBs requires further research attention. Second, the widely used deep learning object detectors provide a score for each detected object as an indicator of confidence, but how to use these indicators effectively in an ensemble method remains a problem. Trying to address these problems, this paper proposes OBBStacking, an ensemble method that is compatible with OBBs and combines the detection results in a learned fashion. This ensemble method helps take 1st place in the Challenge Track \textit{Fine-grained Object Recognition in High-Resolution Optical Images}, which was featured in \textit{2021 Gaofen Challenge on Automated High-Resolution Earth Observation Image Interpretation}. The experiments on DOTA dataset and FAIR1M dataset demonstrate the improved performance of OBBStacking and the features of OBBStacking are analyzed.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **The fusion problem of Oriented Bounding Boxes (OBB) in object detection**: - In remote sensing images, the Oriented Bounding Box (OBB) of an object is its unique feature. Different from traditional horizontal bounding boxes, OBB can represent objects with arbitrary angles. Therefore, how to effectively fuse the OBBs generated by multiple models is a problem that requires further research. 2. **How to effectively utilize the confidence scores provided by deep - learning object detectors**: - Deep - learning object detectors provide a confidence score for each detected object as an indicator of its correctness. However, how to effectively use these confidence scores in the integration method remains a challenge. To address these problems, the paper proposes **OBBStacking**, an OBB - compatible integration method that combines multiple detection results in a learning - based manner. Specifically, OBBStacking solves the following two key problems: - **Model calibration, redundancy, and performance gap**: OBBStacking trains a meta - learner to combine the results of multiple models in the best way, while considering model calibration, redundancy, and performance differences. - **Fusion of Oriented Bounding Boxes**: OBBStacking proposes a new bounding - box fusion method suitable for Oriented Bounding Boxes. This method parameterizes the bounding box as position, width, height, and orientation, and fuses each parameter separately. Through these improvements, OBBStacking won first place in the Challenge Track Fine - grained Object Recognition in High - Resolution Optical Images, and its performance improvement was verified by experiments on the DOTA and FAIR1M datasets. ### Formula summary - **Form of the meta - learner**: \[ \sigma_{\text{WA}}(z)=\sigma(zw + b) \] where \(z = [z_1,z_2,\dots,z_M]\in\mathbb{R}^{2\times M}\) is the logit output from \(M\) member models, \(\sigma(z)=\frac{1}{1+\exp(-z)}\) is the logistic function, and \(w\in\mathbb{R}^M\) and \(b\in\mathbb{R}\) are the weight and intercept parameters of the meta - learner, respectively. - **Negative log - likelihood loss function**: \[ L =-\sum_{i = 1}^{n}\log(\sigma_{\text{WA}}(z_i)(y_i)) \] \[ L=-\sum_{i = 1}^{n}\log(\sigma(z_iw + b)(y_i)) \] - **Formula for Oriented Bounding Box fusion**: \[ o_{\text{fused}}^{(j)}=\frac{\sum_{p = 1}^{n}o_p^{(j)}s_p^*}{\sum_{p = 1}^{n}s_p^*},\quad j = 1,2,3,4 \] where \(s_p^*=\sigma(z_p^{(1)}w(l_p)+b)\), and \(l_p\) is the index of the source model. - **Direction parameter fusion**: \[ \theta_f=\frac{\sum_{p = 1}^{n}r(\theta_p,\theta_{MJ})s_p^*}{\sum_{p = 1}^{n}s_p^*}+\theta_{MJ} \] where \(r(\theta_1,\theta_{MJ})\) (the formula seems incomplete here).