Adaptive Scale and Spatial Aggregation for Real-Time Object Detection

Wei Chen,Yulin He,Zhengfa Liang,Yulan Guo
DOI: https://doi.org/10.1109/ICASSP49357.2023.10096147
2023-01-01
Abstract:Cutting-edge real-time detectors usually reach real-time performance by adopting lightweight architectures. The accuracy of detection may be limited by their insufficient capabilities to obtain powerful feature representation, which is a notoriously onerous task in machine vision applications. Aiming at this problem, this study proposes a method of adaptive aggregation of features at both scale and spatial levels in an anchor-free framework: 1) at the scale level, a Multi-scale Point Feature Fusion (MPFF) module has been proposed to fuse point features from multiple scales via a self-adaptive re-weighting manner; 2) at the spatial level, a Restrained Deformable Convolution (R-DCN) has been designed to focus on the most informative features in a pre-defined region while avoiding the remote feature distraction. Based on R-DCN, an Adaptive Spatial Aggregation (ASA) module has been presented to alleviate the feature misalignment problem in classification and regression tasks via their respective spatial divisions. Extensive experimental results on MS COCO indicate that Adaptive Aggregation Detector (AADet) achieves a state-of-the-art detection performance, i.e., 41.8 AP at 60 FPS.
What problem does this paper attempt to address?