A multi-task Faster R-CNN method for 3D vehicle detection based on a single image

Wankou Yang,Ziyu Li,Chao Wang,Jun Li
DOI: https://doi.org/10.1016/j.asoc.2020.106533
IF: 8.7
2020-10-01
Applied Soft Computing
Abstract:<p>Vehicle detection is an important part of robot environmental perception. In this paper, a 3D vehicle detection method using a single image is proposed to generate the 3D space coordinate information of the object using monocular vision for autonomous driving. The proposed method works under the multi-task framework and integrates 2D object detection, 3D object detection, orientation estimation and key point detection into one unified deep convolution neural network (DCNN) which could be trained by end-to-end learning. Besides, our proposed method is built by modifying Fast R-CNN using multi-task learning, and thus our proposed method is named multi-task Faster R-CNN (MT-Faster R-CNN). The experiments on KITTI dataset are conducted to evaluate our proposed method and the other 3D vehicle detection methods. The experimental results demonstrate that our proposed method is competitive and could significantly assist autonomous driving.</p>
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?
The paper primarily addresses the problem of 3D vehicle detection based on a single image and proposes a method called "Multi-Task Faster R-CNN" (MT-Faster R-CNN). Specifically, the paper aims to achieve the following points: 1. **Propose a new geometric constraint**: Unlike previous methods, this approach uses keypoint coordinates as a new geometric constraint instead of traditional 2D bounding boxes. 2. **Combine 2D and 3D detection**: Integrate 2D and 3D vehicle detection into the same deep convolutional neural network, achieving end-to-end training through multi-task learning. 3. **Improve performance**: Experimental results show that the proposed MT-Faster R-CNN method significantly outperforms baseline methods on the KITTI dataset, demonstrating excellent performance in 3D vehicle detection. 4. **Method improvement**: This method improves the traditional Faster R-CNN network by using a multi-task learning framework to simultaneously optimize multiple loss functions, enabling the entire network to be trained end-to-end and enhancing optimization effectiveness. In summary, this research aims to develop a new method that can efficiently and accurately extract 3D vehicle information from a single image to support applications such as autonomous driving.