ForestDet: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation

Jialian Wu,Liangchen Song,Qian Zhang,Ming Yang,Junsong Yuan
DOI: https://doi.org/10.1109/tmm.2021.3106096
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:Object detection and instance segmentation with a large number of object categories and long-tailed data distribution are challenging for most existing deep learning models. As the number of classes increases, the outputs of a classifier become sensitive to likely noisy logits, which can easily result in an incorrect recognition. To alleviate the large-vocabulary problem, we cluster fine-grained classes into coarser parent classes and then build a classification tree to classify an object into a fine-grained class via its parent class. Because the number of parent class is much fewer, their logits are more stable to suppress the wrong/noisy logits existed in the fine-grained class nodes. Due to a variety of ways for clustering fine-grained classes into parent classes, we can further construct multiple trees to build a classification forest where each single tree contributes its vote to the fine-grained classification. Moreover, a simple yet effective resampling method, termed as NMS Resampling, is proposed aiming at solving the long tail (data imbalance) problem. Our method, coined as ForestDet, serves as a plug-and-play module, which can be readily employed in both one-stage and two-stage object recognition models for recognizing more than 1000 categories. Extensive experiments are conducted on the large vocabulary dataset LVIS. Compared to the Mask R-CNN baseline, our two-stage counterpart Forest R-CNN significantly boosts the performance by 11.5% and 3.9% AP improvements on the rare categories and overall categories, respectively. Compared to the RetinaNet baseline, our one-stage counterpart Forest RetinaNet improves 2.1% AP on overall categories. Moreover, we achieve state-of-the-art results on the LVIS dataset. Code and models are available at https://github.com/JialianW/Forest_RCNN.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?