Triangulation Learning Network: from Monocular to Stereo 3D Object Detection

Zengyi Qin,Jinglu Wang,Yan Lu

DOI: https://doi.org/10.48550/arXiv.1906.01193

2019-06-04

Abstract:In this paper, we study the problem of 3D object detection from stereo images, in which the key challenge is how to effectively utilize stereo information. Different from previous methods using pixel-level depth maps, we propose employing 3D anchors to explicitly construct object-level correspondences between the regions of interest in stereo images, from which the deep neural network learns to detect and triangulate the targeted object in 3D space. We also introduce a cost-efficient channel reweighting strategy that enhances representational features and weakens noisy signals to facilitate the learning process. All of these are flexibly integrated into a solid baseline detector that uses monocular images. We demonstrate that both the monocular baseline and the stereo triangulation learning network outperform the prior state-of-the-arts in 3D object detection and localization on the challenging KITTI dataset.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the key challenge in 3D object detection in stereo images, that is, how to effectively utilize stereo information. Specifically, the paper proposes a method named Triangulation Learning Network (TLNet), aiming to explicitly construct the correspondence of the target in stereo images by using 3D anchors, thereby achieving the detection and triangulation of the target object in 3D space. Different from previous methods relying on pixel - level depth maps, this method avoids the computationally intensive pixel - level disparity maps and directly uses 3D anchors to guide the network to learn the triangulation of the target object. In addition, the paper also introduces a cost - effective channel re - weighting strategy to enhance the feature representation, weaken the noise signal, and promote the learning process. The main contributions of the paper include: 1. Proposing a 3D detection baseline model based on monocular images, whose performance can be comparable to that of the state - of - the - art stereo methods. 2. Developing TLNet, which significantly improves the 3D detection and localization performance of the baseline model on the challenging KITTI dataset by using the geometric correlation of stereo images to accurately locate 3D target objects. 3. Introducing a feature re - weighting strategy, which strengthens the information - rich feature channels by measuring the left - right consistency, making the network pay more attention to the key parts of the target object, thus being beneficial to triangulation learning.

Triangulation Learning Network: from Monocular to Stereo 3D Object Detection

SGM3D: Stereo Guided Monocular 3D Object Detection

Object-Centric Stereo Matching for 3D Object Detection

Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection

FCNet: Stereo 3D Object Detection with Feature Correlation Networks

An Efficient 3D Object Detection Method Based on Fast Guided Anchor Stereo RCNN

DSGN: Deep Stereo Geometry Network for 3D Object Detection

Stereo R-CNN based 3D Object Detection for Autonomous Driving

Reinforced Axial Refinement Network for Monocular 3D Object Detection

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

Stereo RGB and Deeper LIDAR-Based Network for 3D Object Detection in Autonomous Driving.

MonoGRNet: A General Framework for Monocular 3D Object Detection

Multi-Dimensional Cooperative Network for Stereo Matching

STS: Surround-view Temporal Stereo for Multi-view 3D Detection

Depth-Enhancement Network for Monocular 3D object detection

Monocular 3D Detection With Geometric Constraint Embedding and Semi-Supervised Training

MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection.

3D Object Aided Self-Supervised Monocular Depth Estimation

Scanet: Spatial-Channel Attention Network For 3d Object Detection

Point-Guided Contrastive Learning for Monocular 3-D Object Detection