Enhanced Training of Query-Based Object Detection via Selective Query Recollection

Fangyi Chen,Han Zhang,Kai Hu,Yu-kai Huang,Chenchen Zhu,Marios Savvides
DOI: https://doi.org/10.48550/arXiv.2212.07593
2023-03-22
Abstract:This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage. We review the training process and attribute the overlooked phenomenon to two limitations: lack of training emphasis and cascading errors from decoding sequence. We design and present Selective Query Recollection (SQR), a simple and effective training strategy for query-based object detectors. It cumulatively collects intermediate queries as decoding stages go deeper and selectively forwards the queries to the downstream stages aside from the sequential structure. Such-wise, SQR places training emphasis on later stages and allows later stages to work with intermediate queries from earlier stages directly. SQR can be easily plugged into various query-based object detectors and significantly enhances their performance while leaving the inference pipeline unchanged. As a result, we apply SQR on Adamixer, DAB-DETR, and Deformable-DETR across various settings (backbone, number of queries, schedule) and consistently brings 1.4-2.8 AP improvement.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in query - based object detectors, the model makes incorrect predictions in the final stage of decoding, but can predict correctly in the intermediate stage. This phenomenon indicates that the current training strategies have two main limitations: (1) lack of training focus on the later stages; (2) cascading errors in the decoding sequence. ### Specific problem description 1. **Lack of training focus on the later stages**: - In the existing training process, all decoding stages are supervised in the same way, but errors in the early stages can be corrected in the subsequent stages, while the later stages are more responsible for the final prediction. Therefore, the later stages require more training attention. 2. **Cascading errors in the decoding sequence**: - Due to the sequential structure of the decoder, once an intermediate query is refined by a certain stage (regardless of whether this refinement has a positive or negative effect), it will be passed to the subsequent stages. This leads to cascading errors, increases the difficulty of convergence, and prevents the later stages from seeing the previous queries during training. ### Proposed solution To solve the above problems, the authors propose the Selective Query Recollection (SQR) strategy. The main features of SQR include: - **Accumulatively collect intermediate queries**: As the decoding stage progresses, intermediate queries are accumulatively collected. - **Selectively forward - pass queries**: In addition to the sequential structure, queries are selectively passed to the downstream stages, allowing the later stages to directly use the queries from the early stages. - **Enhance supervision in the later stages**: By increasing the number of supervision signals in the later stages, the later stages receive more training attention. - **Alleviate the impact of cascading errors**: Allowing the later stages to directly access the queries from the early stages, thereby reducing the impact of cascading errors. ### Experimental results The authors applied SQR to multiple query - based object detectors (such as Adamixer, DAB - DETR, and Deformable - DETR) and conducted experiments under different settings. The results show that SQR significantly improves the performance of these models, with the AP value increasing by 1.4 - 2.8 points while keeping the inference pipeline unchanged. ### Summary By introducing SQR, the authors effectively solve the problems existing in the training process of query - based object detectors, especially the insufficient training in the later stages and the cascading error problems. This method not only improves the performance of the model but also provides new ideas for further optimizing query - based object detectors.