Cross-modal Deformable DETR for RGB-D Object Detection

Siyuan Qin,Zongqing Lu,An Wang,Nan Su
DOI: https://doi.org/10.1117/12.2682571
2023-01-01
Abstract:RGB-D object detection is a challenging task due to the demand of effectively processing of visible modality and depth modality features. However, pre-existing RGB-D object detection models have several deficiencies, including demand for hand-crafted settings, and insufficient ability of fusing cross-modal features. In this paper, we propose a novel Cross-modal RGB-D object detection model, based on Deformable DETR, named as CM-DETR. Our proposed model can effectively fuse multi-modal information, and don’t need hand-crafted settings resulted from prior information. Extensive experiments show that our model has achieved extraordinary improvement, which exceeds the baseline by more than 4.6% mAP on SUN-RGBD and 6.9% mAP on NYUDv2.
What problem does this paper attempt to address?