Abstract:Monocular 3D vehicle localization is an important task in Intelligent Transportation System (ITS) and Cooperative Vehicle Infrastructure System (CVIS), which is usually achieved by monocular 3D vehicle detection. However, depth information cannot be obtained directly by monocular cameras due to the inherent imaging mechanism, resulting in more challenging monocular 3D tasks. Most of the current monocular 3D vehicle detection methods leverage 2D detectors and additional geometric modules, which reduces the efficiency. In this paper, we propose a 3D vehicle localization network CenterLoc3D for roadside monocular cameras, which directly predicts centroid and eight vertexes in image space, and the dimension of 3D bounding boxes without 2D detectors. To improve the precision of 3D vehicle localization, we propose a weighted-fusion module and a loss with spatial constraints embedded in CenterLoc3D. Firstly, the transformation matrix between 2D image space and 3D world space is solved by camera calibration. Secondly, vehicle type, centroid, eight vertexes, and the dimension of 3D vehicle bounding boxes are obtained by CenterLoc3D. Finally, centroid in 3D world space can be obtained by camera calibration and CenterLoc3D for 3D vehicle localization. To the best of our knowledge, this is the first application of 3D vehicle localization for roadside monocular cameras. Hence, we also propose a benchmark for this application including a dataset (SVLD-3D), an annotation tool (LabelImg-3D), and evaluation metrics. Through experimental validation, the proposed method achieves high accuracy and real-time performance. (limited words, please see the article for more details)

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to perform 3D vehicle localization using roadside monocular cameras in intelligent transportation systems (ITS) and cooperative vehicle - infrastructure systems (CVIS). Specifically, the paper focuses on how to achieve 3D vehicle localization in roadside scenes through monocular 3D vehicle detection methods without a 2D detector. This challenge mainly stems from the fact that monocular cameras cannot directly obtain depth information, making 3D tasks more difficult. Currently, most monocular 3D vehicle detection methods still rely on 2D detectors and additional geometric constraint modules to recover 3D vehicle information, which reduces efficiency. At the same time, most of the existing research is based on datasets from the vehicle - mounted perspective rather than the roadside perspective, which limits the application of large - scale 3D perception. To solve these problems, the paper proposes a 3D vehicle localization network named CenterLoc3D, specifically for roadside monocular cameras. This network can directly predict the center point and eight vertices in the image space, as well as the size of the 3D bounding box, without using a 2D detector. To improve the accuracy of 3D vehicle localization, the paper also proposes a multi - scale weighted fusion module and a loss function with embedded space constraints. In addition, the paper also proposes a benchmark for this application, including a dataset (SVLD - 3D), an annotation tool (LabelImg - 3D), and evaluation metrics. The main contributions of the paper include: 1. Proposing a monocular 3D vehicle localization network CenterLoc3D suitable for roadside surveillance cameras in traffic scenes, which can directly predict accurate 3D vehicle projection vertices and sizes. 2. Proposing a weighted fusion module in multi - scale feature fusion, which further enhances the feature extraction ability. 3. Proposing a loss function with embedded space constraints, which can effectively improve the accuracy of 3D vehicle localization. 4. Proposing a benchmark for 3D vehicle localization in roadside monocular traffic scenes, including a dataset, an annotation tool, and evaluation metrics, which is helpful for the research development in this field.

CenterLoc3D: Monocular 3D Vehicle Localization Network for Roadside Surveillance Cameras

Vehicle 3d Localization in Road Scenes VIA a Monocular Moving Camera

Multimodal Localization: Stereo over LiDAR Map

LocNet: Global Localization in 3D Point Clouds for Mobile Vehicles

Monocular Visual Object 3D Localization in Road Scenes

3D LiDAR-Based Global Localization Using Siamese Neural Network

Single-Camera and Inter-Camera Vehicle Tracking and 3D Speed Estimation Based on Fusion of Visual and Semantic Features

TM3Loc: Tightly-Coupled Monocular Map Matching for High Precision Vehicle Localization

Monocular Vehicle Self-localization Method Based on Compact Semantic Map

An Efficient Vehicle Localization Method by Using Monocular Vision

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

Three-Dimensional Detection Method of Autonomous Driving Platform Based on Roadside Monocular Camera

Monocular 3D Detection for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated Using 3D Results

Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding

Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image

Monocular Localization with Vector HD Map (MLVHM): A Low-Cost Method for Commercial IVs.

Sparse Semantic Map-Based Monocular Localization in Traffic Scenes Using Learned 2D-3D Point-Line Correspondences

Monocular 3-D Vehicle Detection Using a Cascade Network for Autonomous Driving

Delving into Localization Errors for Monocular 3D Object Detection

Joint 3-D Shape Estimation and Landmark Localization from Monocular Cameras of Intelligent Vehicles

Image Guidance Based 3D Vehicle Detection in Traffic Scene.