Abstract:Monocular 3D vehicle localization is an important task in Intelligent Transportation System (ITS) and Cooperative Vehicle Infrastructure System (CVIS), which is usually achieved by monocular 3D vehicle detection. However, depth information cannot be obtained directly by monocular cameras due to the inherent imaging mechanism, resulting in more challenging monocular 3D tasks. Most of the current monocular 3D vehicle detection methods leverage 2D detectors and additional geometric modules, which reduces the efficiency. In this paper, we propose a 3D vehicle localization network CenterLoc3D for roadside monocular cameras, which directly predicts centroid and eight vertexes in image space, and the dimension of 3D bounding boxes without 2D detectors. To improve the precision of 3D vehicle localization, we propose a weighted-fusion module and a loss with spatial constraints embedded in CenterLoc3D. Firstly, the transformation matrix between 2D image space and 3D world space is solved by camera calibration. Secondly, vehicle type, centroid, eight vertexes, and the dimension of 3D vehicle bounding boxes are obtained by CenterLoc3D. Finally, centroid in 3D world space can be obtained by camera calibration and CenterLoc3D for 3D vehicle localization. To the best of our knowledge, this is the first application of 3D vehicle localization for roadside monocular cameras. Hence, we also propose a benchmark for this application including a dataset (SVLD-3D), an annotation tool (LabelImg-3D), and evaluation metrics. Through experimental validation, the proposed method achieves high accuracy and real-time performance. (limited words, please see the article for more details)

Joint Monocular 3d Car Shape Estimation And Landmark Localization Via Cascaded Regression

Joint 3-D Shape Estimation and Landmark Localization from Monocular Cameras of Intelligent Vehicles

A Cross-Dimension Annotations Method for 3D Structural Facial Landmark Extraction

Robust Monocular 3D Car Shape Estimation from 2D Landmarks.

2-Entity RANSAC for Robust Visual Localization in Changing Environment

Efficient Multi-person Hierarchical 3D Pose Estimation for Autonomous Driving

Marker-Less 3d Human Motion Capture With Monocular Image Sequence And Height-Maps

3D Shape Estimation from 2D Landmarks: A Convex Relaxation Approach

The 3D Reconstruction of Face Model with Active Structured Light and Stereo Vision Fusion

Joint Voxel and Coordinate Regression for Accurate 3D Facial Landmark Localization

Online Global Non-rigid Registration for 3D Object Reconstruction Using Consumer-level Depth Cameras

Geometric and appearance model based approach for head pose recovery in monocular image sequence

On 3D Face Reconstruction via Cascaded Regression in Shape Space

Joint Monocular 3D Vehicle Detection and Tracking

Joint Head Pose and Facial Landmark Regression from Depth Images

Monocular 3D object detection via estimation of paired keypoints for autonomous driving

Self-Supervised 3D Reconstruction and Ego-Motion Estimation Via On-Board Monocular Video

Hybrid Iteration and Optimization-based Three-dimensional Reconstruction for Space Non-Cooperative Targets with Monocular Vision and Sparse Lidar Fusion

CenterLoc3D: Monocular 3D Vehicle Localization Network for Roadside Surveillance Cameras

Multiview Facial Landmark Localization in RGB-D Images Via Hierarchical Regression with Binary Patterns.

Part-level Car Parsing and Reconstruction from Single Street View