General Deformable RoI Pooling and Semi-Decoupled Head for Object Detection

Bo Han,Lihuo He,Ying Yu,Wen Lu,Xinbo Gao
DOI: https://doi.org/10.1109/tmm.2024.3391899
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Object detection aims to classify interest objects within an image and pinpoint their positions using predicted rectangular bounding boxes. However, classification and localization tasks are heterogeneous, not only spatially misaligned but also differing in properties and feature requirements. Modern detectors commonly share the spatial region and detection head for both tasks, making them challenging to achieve optimal performance altogether, resulting in inconsistent accuracy. Specifically, the predicted bounding box may have higher classification confidence but lower localization quality, or vice versa. To tackle this issue, the spatial decoupling mechanism via general deformable RoI pooling is first proposed. This mechanism separately pursues the favorable regions for classification and localization, and subsequently extracts the corresponding features. Then, the semi-decoupled head is designed. Compared to the decoupled head that utilizes independent classification and localization networks, potentially leading to excessive decoupling and compromised detection performance, the semi-decoupled head enables the networks to mutually enhance each other while concentrating on their respective tasks. In addition, the semi-decoupled head also introduces a redundancy suppression module to filter out redundant task-irrelevant information of features extracted by separate networks and reinforce task-related information. By combining the spatial decoupling mechanism with the semi-decoupled head, the proposed detector achieves an impressive 43.7 AP in Faster R-CNN framework with ResNet-101 as backbone network. Without bells and whistles, extensive experimental results on the popular MS COCO dataset demonstrate that the proposed detector suppresses the baseline by a significant margin and outperforms some state-of-the-art detectors.
What problem does this paper attempt to address?