Abstract:This paper aims at developing a faster and more accurate solution to the amodal 3D object detection problem for indoor scenarios. The solution is achieved through a novel neural network structure which takes a pair of RGB-D images as input and delivers oriented 3D bounding boxes as the output. Such network, named 3D-SSD, has two components: hierarchical feature fusion and multi-layer prediction. The hierarchical feature fusion combines multi-scale appearance and geometric features learned from RGB-D images, which is later utilized in the multi-layer prediction for object detection. Both the accuracy and the efficiency can be improved by exploiting 2.5D representations in a synergistic way. To specifically address the shape variance of different objects, a set of 3D anchor boxes with varying physical sizes are attached to every location on the prediction layers. While testing, the category scores for 3D anchor boxes are generated with adjusted positions, sizes and orientations, leading to the final detections using non-maximum suppression. Comprehensive experiments have been performed on publicly accessible dataset of SUN RGB-D and NYUV2. The results show the proposed algorithm is the first 3D detector that runs in near real-time on the challenging datasets with competitive performance to the state-of-the-art methods. The 3D-SSD gets 37.1% mAP on the SUN RGB-D dataset at around 5.6 fps, which outperforms the state-of-the-art Deep Sliding Shape by 10.2% mAP and around 109 x faster. For an efficient model setting with a rate of 9.3 fps, 3D-SSD still gets an accuracy of 37% on mAP. Further, experiments also suggest the proposed approach achieves comparable accuracy and is about 477 x faster than the state-of-art method on the NYUv2 dataset even with a smaller input image size. (C) 2019 Published by Elsevier B.V.

Multi-view shape estimation of transparent containers

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection

Container Localisation and Mass Estimation with an RGB-D Camera

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

Robust 3D Reconstruction with an RGB-D Camera

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Object Segmentation Ensuring Consistency Across Multi-Viewpoint Images

CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation

6DoF Pose Estimation of Transparent Object from a Single RGB-D Image

Joint Multiview Segmentation And Localization Of Rgb-D Images Using Depth-Induced Silhouette Consistency

From Points to Multi-Object 3D Reconstruction

Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators

Multi-view object pose estimation from correspondence distributions and epipolar geometry

Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior

View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery

ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth

Virtual Multi-view Fusion for 3D Semantic Segmentation