Can the Query-based Object Detector Be Designed with Fewer Stages?

Jialin Li,Weinong Fu,Yu‐Syuan Lin,Qiang Nie,Yong Liu
DOI: https://doi.org/10.48550/arxiv.2309.16306
2023-01-01
Abstract:Query-based object detectors have made significant advancements since the publication of DETR. However, most existing methods still rely on multi-stage encoders and decoders, or a combination of both. Despite achieving high accuracy, the multi-stage paradigm (typically consisting of 6 stages) suffers from issues such as heavy computational burden, prompting us to reconsider its necessity. In this paper, we explore multiple techniques to enhance query-based detectors and, based on these findings, propose a novel model called GOLO (Global Once and Local Once), which follows a two-stage decoding paradigm. Compared to other mainstream query-based models with multi-stage decoders, our model employs fewer decoder stages while still achieving considerable performance. Experimental results on the COCO dataset demonstrate the effectiveness of our approach.
What problem does this paper attempt to address?