A Multilevel Fusion Network for 3D Object Detection.

Chunlong Xia,Ping Wei,Wenwen Wei,Nanning Zheng
DOI: https://doi.org/10.1016/j.neucom.2021.01.025
IF: 6
2021-01-01
Neurocomputing
Abstract:3D object detection is an important yet challenging problem in a myriad of vision, robotics, and human- machine interaction applications. Given an RGB-D image, the task is to infer the class labels and the 3D bounding boxes of the objects in the image. While the previous studies have made remarkable progress over the past decade, how to effectively exploit the feature fusion with neural networks for boosting 3D object detection performance remains an open problem. This paper proposes a multilevel fusion network (MFN) model to detect 3D objects in RGB-D images. The MFN model contains two streams of neural networks which respectively extracts the RGB and depth features with cascaded convolutional modules. To effectively exploit the information of 3D objects, a multilevel fusion mechanism is adopted to fuse the convolutional RGB and depth features at multiple levels. To train the network, we propose a new weighted loss function by encoding the difference of geometric attributes on 3D bounding box regression. Since the original depth data is full of noisy holes, we also develop an adaptive filtering algorithm to restore and correct the depth images. We test the proposed model on challenging RGB-D datasets. The experimental results on the datasets prove the strength and advantage of the proposed model. (c) 2021 Elsevier B.V. All rights reserved. 3D object detection is an important yet challenging problem in a myriad of vision, robotics, and human? machine interaction applications. Given an RGB-D image, the task is to infer the class labels and the 3D bounding boxes of the objects in the image. While the previous studies have made remarkable progress over the past decade, how to effectively exploit the feature fusion with neural networks for boosting 3D object detection performance remains an open problem. This paper proposes a multilevel fusion network (MFN) model to detect 3D objects in RGB-D images. The MFN model contains two streams of neural networks which respectively extracts the RGB and depth features with cascaded convolutional modules. To effectively exploit the information of 3D objects, a multilevel fusion mechanism is adopted to fuse the convolutional RGB and depth features at multiple levels. To train the network, we propose a new weighted loss function by encoding the difference of geometric attributes on 3D bounding box regression. Since the original depth data is full of noisy holes, we also develop an adaptive filtering algorithm to restore and correct the depth images. We test the proposed model on challenging RGB-D datasets. The experimental results on the datasets prove the strength and advantage of the proposed model.
What problem does this paper attempt to address?