Rethinking CNN Architectures in Transformer Detectors

Mengze Pan,Kai Tian,Qingmin Liao
DOI: https://doi.org/10.1007/978-3-031-44204-9_32
2023-01-01
Abstract:Since the introduction of Transformer into the field of object detection, numerous researchers have endeavored to leverage its strong long-distance dependency modeling capabilities. However, huge computational cost and lack of prior knowledge are always the pain points. In this paper, we try to find an alternative method to improve DETR-like models by rethinking the possibility of CNN in DETR from different perspectives. We propose a novel multi-scale patch embedding module, a new DETR encoder module and an auxiliary assessment strategy, which bring prior knowledge into DETR to accelerate the convergence, and enhance the final performance.
What problem does this paper attempt to address?