Abstract:3D object detection is an important yet challenging problem in a myriad of vision, robotics, and human- machine interaction applications. Given an RGB-D image, the task is to infer the class labels and the 3D bounding boxes of the objects in the image. While the previous studies have made remarkable progress over the past decade, how to effectively exploit the feature fusion with neural networks for boosting 3D object detection performance remains an open problem. This paper proposes a multilevel fusion network (MFN) model to detect 3D objects in RGB-D images. The MFN model contains two streams of neural networks which respectively extracts the RGB and depth features with cascaded convolutional modules. To effectively exploit the information of 3D objects, a multilevel fusion mechanism is adopted to fuse the convolutional RGB and depth features at multiple levels. To train the network, we propose a new weighted loss function by encoding the difference of geometric attributes on 3D bounding box regression. Since the original depth data is full of noisy holes, we also develop an adaptive filtering algorithm to restore and correct the depth images. We test the proposed model on challenging RGB-D datasets. The experimental results on the datasets prove the strength and advantage of the proposed model. (c) 2021 Elsevier B.V. All rights reserved. 3D object detection is an important yet challenging problem in a myriad of vision, robotics, and human? machine interaction applications. Given an RGB-D image, the task is to infer the class labels and the 3D bounding boxes of the objects in the image. While the previous studies have made remarkable progress over the past decade, how to effectively exploit the feature fusion with neural networks for boosting 3D object detection performance remains an open problem. This paper proposes a multilevel fusion network (MFN) model to detect 3D objects in RGB-D images. The MFN model contains two streams of neural networks which respectively extracts the RGB and depth features with cascaded convolutional modules. To effectively exploit the information of 3D objects, a multilevel fusion mechanism is adopted to fuse the convolutional RGB and depth features at multiple levels. To train the network, we propose a new weighted loss function by encoding the difference of geometric attributes on 3D bounding box regression. Since the original depth data is full of noisy holes, we also develop an adaptive filtering algorithm to restore and correct the depth images. We test the proposed model on challenging RGB-D datasets. The experimental results on the datasets prove the strength and advantage of the proposed model.

A Multilevel Fusion Network for 3D Object Detection.

MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection

MFDetection: A Highly Generalized Object Detection Network Unified with Multilevel Heterogeneous Image Fusion

A Multi-Level Semantic Fusion VoteNet for 3D Object Detection on Point Clouds

Cascaded Cross-Modality Fusion Network for 3D Object Detection

Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

MFUR-Net

MMAF-Net: Multi-view multi-stage adaptive fusion for multi-sensor 3D object detection

MLF3D: Multi-Level Fusion for Multi-Modal 3D Object Detection

Lightweight Multi-Level Feature Difference Fusion Network for RGB-D-T Salient Object Detection

MFFNet: Multimodal Feature Fusion Network for RGB-D Transparent Object Detection

BMFN3D: Bidirectional multilayer fusion network for indoor 3D object detection

MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection

3DMMF: 3D Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

Fine-Grained Multilevel Fusion for Anti-Occlusion Monocular 3D Object Detection

Multi-feature Fusion VoteNet for 3D Object Detection

Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection

One-Stage Multi-Sensor Data Fusion Convolutional Neural Network For 3d Object Detection

TBFNT3D: Two-Branch Fusion Network with Transformer for Multimodal Indoor 3D Object Detection

DMFF: dual-way multimodal feature fusion for 3D object detection

HFMDNet: Hierarchical Fusion and Multilevel Decoder Network for RGB-D Salient Object Detection