Decoupling and Interaction: Task Coordination in Single-Stage Object Detection
Jia-Wei Ma,Shu Tian,Haixia Man,Song-Lu Chen,Jingyan Qin,Xu-Cheng Yin
DOI: https://doi.org/10.1007/s11042-024-19257-x
IF: 2.577
2024-01-01
Multimedia Tools and Applications
Abstract:In the field of computer vision, general single-stage object detection methods employ two individual subnets within detection head, serving classification and localization purposes respectively. However, the lack of explicit modeling for distinctions and associations poses challenges for aligning the spatial feature perception of these two tasks, consequently leading to sub-optimal detection performance. Although some methods utilize classification to evaluate localization, it is a compromise rather than multi-task optimization. In this paper, we propose a Task-coordinated Single-stage Object Detector (TSOD) to enhance the coordination of multiple tasks. Firstly, we introduce a Task-decoupled Feature Alignment Mechanism (TFAM), which adaptively provides compatible features for different tasks by decoupling spatial information. For classification and localization, the network adaptively samples from category-sensitive regions and boundary-separable regions, respectively. Secondly, we propose a Task-interactive Enhancement Mechanism (TEM), which explicitly combines different task-sensitive features for joint classification score prediction and selects samples with high task consistency for training. Through this interaction mechanism, consistency between tasks is bolstered. We conduct extensive experiments on the COCO, Cityscapes, CrowdHuman and WiderFace datasets to evaluate the performance of TSOD. The results demonstrate that our model outperforms several state-of-the-art detectors, achieving a 2.0 AP improvement over the baseline on COCO minival and a remarkable 50.4 AP at single-model single-scale testing on COCO test-dev. Additionally, our model, equipped with ResNet-50, performs significantly better than other representative detectors on the Cityscapes, CrowdHuman, and WiderFace datasets, showcasing its robustness and generalizability. Our study contributes a new perspective to the design of single-stage object detectors by emphasizing the importance of decoupling and interaction, which is crucial for task coordination. The experimental results validate the effectiveness of our proposed TSOD and its potential as a leading approach in the field. Codes are available at https://github.com/Majiawei/tsod-complete .