Deep multi-scale and multi-modal fusion for 3D object detection

Rui Guo,Deng Li,Yahong Han
DOI: https://doi.org/10.1016/j.patrec.2021.08.028
IF: 4.757
2021-11-01
Pattern Recognition Letters
Abstract:The perception of 3D objects in the scene is the basis of autonomous driving. Most autonomous driving cars are equipped with cameras and Lidar to obtain 3D spatial information. RGB images taken from the camera and point cloud produced by Lidar both have their own advantages for 3D object detection. In order to make better use of the advantages of image data and point cloud data, a 3D object detection method based on Deep Multi-scale and Multi-modal Fusion (DMMF) is proposed. Firstly, point cloud is projected to the Bird's Eye View (BEV) and extract BEV map and RGB image feature with feature extractor, respectively. Then, fuse the multi-modal feature with the deep multi-scale fusion method and finally input to position regression and classification network for object classification and accurate positioning. The experimental results on the benchmark KITTI dataset show that the method reaches state-of-the-art in both car and pedestrian classes, especially for hard level data, the detection AP is significantly improved.
computer science, artificial intelligence
What problem does this paper attempt to address?