Boosting Long-tailed Object Detection via Step-wise Learning on Smooth-tail Data

Na Dong,Yongqiang Zhang,Mingli Ding,Gim Hee Lee
2023-05-22
Abstract:Real-world data tends to follow a long-tailed distribution, where the class imbalance results in dominance of the head classes during training. In this paper, we propose a frustratingly simple but effective step-wise learning framework to gradually enhance the capability of the model in detecting all categories of long-tailed datasets. Specifically, we build smooth-tail data where the long-tailed distribution of categories decays smoothly to correct the bias towards head classes. We pre-train a model on the whole long-tailed data to preserve discriminability between all categories. We then fine-tune the class-agnostic modules of the pre-trained model on the head class dominant replay data to get a head class expert model with improved decision boundaries from all categories. Finally, we train a unified model on the tail class dominant replay data while transferring knowledge from the head class expert model to ensure accurate detection of all categories. Extensive experiments on long-tailed datasets LVIS v0.5 and LVIS v1.0 demonstrate the superior performance of our method, where we can improve the AP with ResNet-50 backbone from 27.0% to 30.3% AP, and especially for the rare categories from 15.5% to 24.9% AP. Our best model using ResNet-101 backbone can achieve 30.7% AP, which suppresses all existing detectors using the same backbone.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the issue of class imbalance in object detection tasks on long-tailed distribution datasets and proposes a solution. Specifically, the main contributions and problems addressed by the paper can be summarized as follows: 1. **Problem Background**: In real-world datasets, the distribution of object categories often follows a long-tailed distribution, where a few categories (head categories) have a large number of instances, while most categories (tail categories) have only a few samples. This imbalance causes the trained model to be biased towards the head categories, resulting in significantly reduced detection performance for the tail categories. 2. **Main Contributions and Solutions**: - A method for constructing "smoothed tail" data is proposed, which involves reorganizing the dataset to create two subsets: one that is primarily composed of head categories but includes a small number of tail categories (head-dominated data), and another that mainly includes tail categories but also has a few head categories (tail-dominated data). This approach alleviates the extreme class imbalance in the original long-tailed distribution and reduces the occurrence of catastrophic forgetting. - A progressive learning framework is designed, combining fine-tuning and knowledge transfer techniques for long-tailed object detection tasks. First, the model is pre-trained on the complete long-tailed dataset to retain the discriminative ability among categories; then, only the category-specific modules in the pre-trained model are updated while keeping the category-agnostic modules unchanged to obtain a head category expert model; finally, a unified model is trained on the tail-dominated dataset, utilizing knowledge transfer from the head category expert model to ensure accurate detection for all categories. 3. **Experimental Results**: Extensive experiments were conducted on the LVIS v0.5 and LVIS v1.0 datasets. The results show that the proposed framework significantly improves overall accuracy, especially for rare categories. For example, on the LVIS v0.5 dataset, using ResNet-50 as the backbone network, the average precision (AP) increased from 27.0% to 30.3%, with the AP for rare categories increasing from 15.5% to 24.9%. In summary, the method proposed in this paper aims to effectively address the class imbalance issue in object detection on long-tailed distribution datasets through progressive learning and knowledge transfer strategies, significantly enhancing the detection performance for tail categories.