Abstract:DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models. However, all tokens are treated equally without discrimination brings a redundant computational burden in the traditional encoder structure. The recent sparsification strategies exploit a subset of informative tokens to reduce attention complexity maintaining performance through the sparse encoder. But these methods tend to rely on unreliable model statistics. Moreover, simply reducing the token population hinders the detection performance to a large extent, limiting the application of these sparse models. We propose Focus-DETR, which focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Specifically, we reconstruct the encoder with dual attention, which includes a token scoring mechanism that considers both localization and category semantic information of the objects from multi-scale feature maps. We efficiently abandon the background queries and enhance the semantic interaction of the fine-grained object queries based on the scores. Compared with the state-of-the-art sparse DETR-like detectors under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO. The code is available at <a class="link-external link-https" href="https://github.com/huawei-noah/noah-research/tree/master/Focus-DETR" rel="external noopener nofollow">this https URL</a> and <a class="link-external link-https" href="https://gitee.com/mindspore/models/tree/master/research/cv/Focus-DETR" rel="external noopener nofollow">this https URL</a>.

Pruning DETR: efficient end-to-end object detection with sparse structured pruning

Efficient DETR: Improving End-to-End Object Detector with Dense Prior

Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

Deformable DETR: Deformable Transformers for End-to-End Object Detection

End-to-End Object Detection with Adaptive Clustering Transformer

L-DETR: A Light-Weight Detector for End-to-End Object Detection With Transformers

DETR-ORD: An Improved DETR Detector for Oriented Remote Sensing Object Detection with Feature Reconstruction and Dynamic Query

DETR++: Taming Your Multi-Scale Detection Transformer

SpeedDETR: Speed-aware Transformers for End-to-end Object Detection.

Less is More: Focus Attention for Efficient DETR

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Towards Data-Efficient Detection Transformers

Revisiting DETR Pre-training for Object Detection

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

Rethinking Transformer-based Set Prediction for Object Detection

Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection

DHS-DETR: Efficient DETRs with Dynamic Head Switching

Focus-Attention Approach in Optimizing DETR for Object Detection from High-Resolution Images

ComplETR: Reducing the cost of annotations for object detection in dense scenes with vision transformers

An efficient fire and smoke detection algorithm based on an end-to-end structured network

PMG-DETR: fast convergence of DETR with position-sensitive multi-scale attention and grouped queries