Efficient object detector via dynamic prior and dynamic feature fusion

Zihang Zhang,Yuling Liu,Zhili Zhou,Gaobo Yang,Q M Jonathan Wu
DOI: https://doi.org/10.1093/comjnl/bxae082
2024-09-16
The Computer Journal
Abstract:Abstract Sparse R-CNN is a new paradigm of object detection, which predicts objects in a sparse way. However, there are some limitations in Sparse R-CNN. One is the presence of weak prior information caused by fixed learnable proposal boxes and features across different images, necessitating excessive iterations for the model to refine its predictions; the other is the inadequate exploitation of multi-scale information, leading to the sub-optimal detection performance. Thus, building upon Sparse R-CNN, we propose an efficient detector that incorporates dynamic prior and dynamic feature fusion, called $D^{2}$-Det. In particular, for the dynamic prior part, a prior information generator module dynamically generates proposal features and boxes as the dynamic prior for different images to alleviate the inference-inefficient iterative refinement process of predictions, and we further propose the class scores decoupling method to reduce the computation overhead. Furthermore, for the dynamic feature fusion part, we develop a novel lightweight multi-scale feature fusion module, which dynamically aggregates features from all layers for each proposal box, enabling adaptive feature fusion and improving detection precision by nearly 2 AP. Experiments show that $D^{2}$-Det can achieve 46.6 AP on COCO 2017 with fewer computations for the backbone ResNet50, surpassing most of the state-of-the-art detectors.
computer science, information systems, theory & methods, software engineering, hardware & architecture
What problem does this paper attempt to address?