Feature difference for single-shot object detection

Tao Zeng,Feng Xu,Xin Lyu,Xin Li,Xinyuan Wang,Jiale Chen,Caifeng Wu
DOI: https://doi.org/10.1049/ipr2.12601
IF: 2.3
2022-01-01
IET Image Processing
Abstract:The one-stage detectors achieve a good trade-off between performance and latency, owing to the plain architecture and divergent learning mechanism for classification and localization. However, the two sub-tasks require features with various inherency with which to generate inconsistent detections, fettering detectors. In this study, the misalignment is deeply analyzed via kernel density estimation (KDE) for the first time. Moreover, to address the misalignment, a plug-and-play detection head, named Diff-Head, is devised and embedded in one-stage detectors. Concretely, the authors merge parallel branches into a semi-parallel structure, establishing the correlation between classification and regression. In the regression branch, a feature difference module (FDM) gets rid of the features that favour classification by subtracting salient object features from the original feature map, and position encoding (PE) modules enhance the absolute position information. The flexibility and efficiency of the detection head are retained. Experiments on Pascal visual object classes (VOC) and MS COCO demonstrate that Diff-Head is effective and achieves competitive performance with state-of-the-art detectors. Meanwhile, the amount of parameters is reduced at least 30% and 83.0% average precision (AP) is achieved on Pascal VOC. The analyses of consistency and error show that Diff-Head has better localization and the capability of mitigating the misalignment.
What problem does this paper attempt to address?