A multi-task Faster R-CNN method for 3D vehicle detection based on a single image

Wankou Yang,Ziyu Li,Chao Wang,Jun Li

DOI: https://doi.org/10.1016/j.asoc.2020.106533

IF: 8.7

2020-10-01

Applied Soft Computing

Abstract:<p>Vehicle detection is an important part of robot environmental perception. In this paper, a 3D vehicle detection method using a single image is proposed to generate the 3D space coordinate information of the object using monocular vision for autonomous driving. The proposed method works under the multi-task framework and integrates 2D object detection, 3D object detection, orientation estimation and key point detection into one unified deep convolution neural network (DCNN) which could be trained by end-to-end learning. Besides, our proposed method is built by modifying Fast R-CNN using multi-task learning, and thus our proposed method is named multi-task Faster R-CNN (MT-Faster R-CNN). The experiments on KITTI dataset are conducted to evaluate our proposed method and the other 3D vehicle detection methods. The experimental results demonstrate that our proposed method is competitive and could significantly assist autonomous driving.</p>

computer science, artificial intelligence, interdisciplinary applications

What problem does this paper attempt to address?

The paper primarily addresses the problem of 3D vehicle detection based on a single image and proposes a method called "Multi-Task Faster R-CNN" (MT-Faster R-CNN). Specifically, the paper aims to achieve the following points: 1. **Propose a new geometric constraint**: Unlike previous methods, this approach uses keypoint coordinates as a new geometric constraint instead of traditional 2D bounding boxes. 2. **Combine 2D and 3D detection**: Integrate 2D and 3D vehicle detection into the same deep convolutional neural network, achieving end-to-end training through multi-task learning. 3. **Improve performance**: Experimental results show that the proposed MT-Faster R-CNN method significantly outperforms baseline methods on the KITTI dataset, demonstrating excellent performance in 3D vehicle detection. 4. **Method improvement**: This method improves the traditional Faster R-CNN network by using a multi-task learning framework to simultaneously optimize multiple loss functions, enabling the entire network to be trained end-to-end and enhancing optimization effectiveness. In summary, this research aims to develop a new method that can efficiently and accurately extract 3D vehicle information from a single image to support applications such as autonomous driving.

A multi-task Faster R-CNN method for 3D vehicle detection based on a single image

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

Monocular 3-D Vehicle Detection Using a Cascade Network for Autonomous Driving

Vehicle Behavior Recognition using Multi-Stream 3D Convolutional Neural Network

Multi-Task Vehicle Detection with Region-of-Interest Voting.

Stereo R-CNN based 3D Object Detection for Autonomous Driving

Multiclass objects detection algorithm using DarkNet-53 and DenseNet for intelligent vehicles

3D Vehicle Detection Using Multi-Level Fusion From Point Clouds and Images

RTM3D: Real-Time Monocular 3D Detection from Object Keypoints for Autonomous Driving

Image Guidance Based 3D Vehicle Detection in Traffic Scene.

RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point Cloud for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving

FP-RCNN: A Real-Time 3D Target Detection Model based on Multiple Foreground Point Sampling for Autonomous Driving

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

Vehicle 3d Localization in Road Scenes VIA a Monocular Moving Camera

Ground-aware Monocular 3D Object Detection for Autonomous Driving

MonoDCN: Monocular 3D object detection based on dynamic convolution

Monocular 3D object detection using dual quadric for autonomous driving

Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

Real-time Vehicle Detection and Tracking in Video Based on Faster R-CNN