Abstract:This paper aims at developing a faster and more accurate solution to the amodal 3D object detection problem for indoor scenarios. The solution is achieved through a novel neural network structure which takes a pair of RGB-D images as input and delivers oriented 3D bounding boxes as the output. Such network, named 3D-SSD, has two components: hierarchical feature fusion and multi-layer prediction. The hierarchical feature fusion combines multi-scale appearance and geometric features learned from RGB-D images, which is later utilized in the multi-layer prediction for object detection. Both the accuracy and the efficiency can be improved by exploiting 2.5D representations in a synergistic way. To specifically address the shape variance of different objects, a set of 3D anchor boxes with varying physical sizes are attached to every location on the prediction layers. While testing, the category scores for 3D anchor boxes are generated with adjusted positions, sizes and orientations, leading to the final detections using non-maximum suppression. Comprehensive experiments have been performed on publicly accessible dataset of SUN RGB-D and NYUV2. The results show the proposed algorithm is the first 3D detector that runs in near real-time on the challenging datasets with competitive performance to the state-of-the-art methods. The 3D-SSD gets 37.1% mAP on the SUN RGB-D dataset at around 5.6 fps, which outperforms the state-of-the-art Deep Sliding Shape by 10.2% mAP and around 109 x faster. For an efficient model setting with a rate of 9.3 fps, 3D-SSD still gets an accuracy of 37% on mAP. Further, experiments also suggest the proposed approach achieves comparable accuracy and is about 477 x faster than the state-of-art method on the NYUv2 dataset even with a smaller input image size. (C) 2019 Published by Elsevier B.V.

CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images

CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images

ObjectFusion: an Object Detection and Segmentation Framework with RGB-D SLAM and Convolutional Neural Networks

3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection

Object-aware Semantic Mapping of Indoor Scenes Using Octomap

2D-to-3D Projection for Monocular and Multi-View 3D Multi-class Object Detection in Indoor Scenes

NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

Refined Voting and Scene Feature Fusion for 3D Object Detection in Point Clouds

Hybrid 3D Reconstruction of Indoor Scenes Integrating Object Recognition

RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

DMSC-Net: A deep Multi-Scale context network for 3D object detection of indoor point clouds

SRCN3D: Sparse R-CNN 3D Surround-View Camera Object Detection and Tracking for Autonomous Driving

MMAF-Net: Multi-view multi-stage adaptive fusion for multi-sensor 3D object detection

CAF-RCNN: multimodal 3D object detection with cross-attention

CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds

Scanet: Spatial-Channel Attention Network For 3d Object Detection

UniDet3D: Multi-dataset Indoor 3D Object Detection