Similarity Distance-Based Label Assignment for Tiny Object Detection
Shuohao Shi,Qiang Fang,Tong Zhao,Xin Xu
2024-07-26
Abstract:Tiny object detection is becoming one of the most challenging tasks in computer vision because of the limited object size and lack of information. The label assignment strategy is a key factor affecting the accuracy of object detection. Although there are some effective label assignment strategies for tiny objects, most of them focus on reducing the sensitivity to the bounding boxes to increase the number of positive samples and have some fixed hyperparameters need to set. However, more positive samples may not necessarily lead to better detection results, in fact, excessive positive samples may lead to more false positives. In this paper, we introduce a simple but effective strategy named the Similarity Distance (SimD) to evaluate the similarity between bounding boxes. This proposed strategy not only considers both location and shape similarity but also learns hyperparameters adaptively, ensuring that it can adapt to different datasets and various object sizes in a dataset. Our approach can be simply applied in common anchor-based detectors in place of the IoU for label assignment and Non Maximum Suppression (NMS). Extensive experiments on four mainstream tiny object detection datasets demonstrate superior performance of our method, especially, 1.8 AP points and 4.1 AP points of very tiny higher than the state-of-the-art competitors on AI-TOD. Code is available at: \url{<a class="link-external link-https" href="https://github.com/cszzshi/SimD" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the label assignment strategy in tiny object detection. Specifically, the task of tiny object detection has become increasingly challenging in the field of computer vision because of the limited size and insufficient information of the objects. The label assignment strategy is one of the key factors affecting detection accuracy.
### Background of the paper and problem description
1. **Limitations of existing methods**:
- Although there are some effective label assignment strategies for tiny object detection, most methods mainly focus on reducing the sensitivity to bounding boxes to increase the number of positive samples, and these methods usually need to set fixed hyper - parameters.
- An excessive number of positive samples does not necessarily lead to better detection results, but may instead lead to more false positives.
2. **Main challenges in tiny object detection**:
- Information loss: Due to the down - sampling operations used in the feature extraction process, a large amount of information of tiny objects is lost.
- Noise interference: Tiny objects are easily affected by noise.
- Bounding box sensitivity: Tiny objects are more sensitive to changes in bounding boxes, and using traditional measurement methods such as IoU will result in a very small number of positive samples.
### Proposed solution
To solve the above problems, the author proposes a new measurement method - Similarity Distance (SimD) to evaluate the similarity between two bounding boxes. SimD not only considers the similarity of position and shape, but can also adaptively adjust hyper - parameters to ensure that it can adapt to different datasets and objects of different sizes.
### Main contributions
1. **Proposing the SimD measurement**:
- SimD considers both position similarity and shape similarity simultaneously, and can effectively adapt to different datasets and objects of different sizes without manually setting any hyper - parameters.
2. **Experimental verification**:
- Extensive experiments were carried out on four mainstream tiny object detection datasets, which proved the effectiveness of this method. In particular, on the AI - TOD dataset, the AP value of SimD in very small object detection is 1.8 points higher than that of the existing best method, and the AP value on very small objects (2 to 8 pixels) is even increased by 4.1 points.
### Method introduction
#### Definition of Similarity Distance (SimD)
The definition of SimD is as follows:
\[
\text{SimD} = e^{-(\text{sim\_location} + \text{sim\_shape})}
\]
Among them, the position similarity (sim\_location) and shape similarity (sim\_shape) are respectively defined as:
\[
\text{sim\_location} = \sqrt{\left(\frac{x_g - x_a}{\frac{1}{m} \times (w_g + w_a)}\right)^2 + \left(\frac{y_g - y_a}{\frac{1}{n} \times (h_g + h_a)}\right)^2}
\]
\[
\text{sim\_shape} = \sqrt{\left(\frac{w_g - w_a}{\frac{1}{m} \times (w_g + w_a)}\right)^2 + \left(\frac{h_g - h_a}{\frac{1}{n} \times (h_g + h_a)}\right)^2}
\]
Among them, \( m \) and \( n \) respectively represent normalization parameters, and the calculation formulas are as follows:
\[
m = \frac{\sum_{i = 1}^M \sum_{j = 1}^{N_i} \sum_{k = 1}^{Q_i} \left| x_{ij} - x_{ik} \right|}{\sum_{i = 1}^M N_i \times Q_i \times \left( w_{ij} + w_{ik} \right)}
\]
\[
n = \frac{\sum_{i = 1}^M \sum_{j = 1}^{N_i} \sum_{k = 1}^{Q_i} \left| y_{ij} - y_{ik} \right|}{\sum_{i = 1}^M N_i \times Q_i \times \left( h_{ij} + h_{ik} \right)}
\]