CoDRMA: Collaborative Depth Refinement Via Dual-Mask and Dual-Attention for Bird’s Eye View Collaborative 3D Object Detection

Kang Yang,Yongcai Wang,Yunjun Han,Qing-Shan Jia
DOI: https://doi.org/10.1109/case59546.2024.10711318
2024-01-01
Abstract:In collaborative perception, camera-based approach is more informative and economical than Lidar-based approach. However, currently, camera-based methods still have a significant performance gap compared to the Lidar-based approach due to the difficulty and uncertainty involved in depth estimation. This paper introduces a strategy that refines depth estimation using foreground and background information, which empowers accurate Bird’s Eye View (BEV) collaborative 3D detection by multi-agents. Our strategy encompasses two stages: Initially, we introduce the Dual-Mask to enhance depth estimation and employ Bird’s Eye View (BEV) paradigms for integrating multi-viewpoint data, facilitating a comprehensive scene analysis. In the second stage, we generate pseudo-images by fusing depth and masks as auxiliary messages. A Dual-Attention scheme is proposed, which leverages multi-agent communication to augment auxiliary insights and further refine depth estimations. By refining the depth information twice, our method effectively improves BEV-based collaboration 3D object detection accuracy especially the occlused and long distance objects. Experiments on the OPV2V dataset show that our method achieves state-of-the-art performance in 3D object detection task among known camera-based methods, narrowing the gap with Lidar-based methods. Codes will be made available.
What problem does this paper attempt to address?