Abstract:This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage. We review the training process and attribute the overlooked phenomenon to two limitations: lack of training emphasis and cascading errors from decoding sequence. We design and present Selective Query Recollection (SQR), a simple and effective training strategy for query-based object detectors. It cumulatively collects intermediate queries as decoding stages go deeper and selectively forwards the queries to the downstream stages aside from the sequential structure. Such-wise, SQR places training emphasis on later stages and allows later stages to work with intermediate queries from earlier stages directly. SQR can be easily plugged into various query-based object detectors and significantly enhances their performance while leaving the inference pipeline unchanged. As a result, we apply SQR on Adamixer, DAB-DETR, and Deformable-DETR across various settings (backbone, number of queries, schedule) and consistently brings 1.4-2.8 AP improvement.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in query - based object detectors, the model makes incorrect predictions in the final stage of decoding, but can predict correctly in the intermediate stage. This phenomenon indicates that the current training strategies have two main limitations: (1) lack of training focus on the later stages; (2) cascading errors in the decoding sequence. ### Specific problem description 1. **Lack of training focus on the later stages**: - In the existing training process, all decoding stages are supervised in the same way, but errors in the early stages can be corrected in the subsequent stages, while the later stages are more responsible for the final prediction. Therefore, the later stages require more training attention. 2. **Cascading errors in the decoding sequence**: - Due to the sequential structure of the decoder, once an intermediate query is refined by a certain stage (regardless of whether this refinement has a positive or negative effect), it will be passed to the subsequent stages. This leads to cascading errors, increases the difficulty of convergence, and prevents the later stages from seeing the previous queries during training. ### Proposed solution To solve the above problems, the authors propose the Selective Query Recollection (SQR) strategy. The main features of SQR include: - **Accumulatively collect intermediate queries**: As the decoding stage progresses, intermediate queries are accumulatively collected. - **Selectively forward - pass queries**: In addition to the sequential structure, queries are selectively passed to the downstream stages, allowing the later stages to directly use the queries from the early stages. - **Enhance supervision in the later stages**: By increasing the number of supervision signals in the later stages, the later stages receive more training attention. - **Alleviate the impact of cascading errors**: Allowing the later stages to directly access the queries from the early stages, thereby reducing the impact of cascading errors. ### Experimental results The authors applied SQR to multiple query - based object detectors (such as Adamixer, DAB - DETR, and Deformable - DETR) and conducted experiments under different settings. The results show that SQR significantly improves the performance of these models, with the AP value increasing by 1.4 - 2.8 points while keeping the inference pipeline unchanged. ### Summary By introducing SQR, the authors effectively solve the problems existing in the training process of query - based object detectors, especially the insufficient training in the later stages and the cascading error problems. This method not only improves the performance of the model but also provides new ideas for further optimizing query - based object detectors.

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

Can the Query-based Object Detector Be Designed with Fewer Stages?

What Are Expected Queries in End-to-End Object Detection?

Deep Equilibrium Object Detection

Dense Distinct Query for End-to-End Object Detection

Dynamic Cascade Query Selection for Oriented Object Detection

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection

StageInteractor: Query-based Object Detector with Cross-stage Interaction

AdaMixer: A Fast-Converging Query-Based Object Detector

Query-Based Object Visual Tracking with Parallel Sequence Generation

Knowledge Distillation via Query Selection for Detection Transformer

Conditional DETR V2: Efficient Detection Transformer with Box Queries

Multi-modal Queried Object Detection in the Wild

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Anchor DETR: Query Design for Transformer-Based Detector

Dynamic Object Queries for Transformer-based Incremental Object Detection

HA-DQS-Net: dynamic query design based on transformer with hollow attention

Object as Query: Equipping Any 2D Object Detector with 3D Detection Ability

RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Dense Object Detection Based on De-Homogenized Queries