Abstract:Prior point cloud provides 3D environmental context, which enhances the capabilities of monocular camera in downstream vision tasks, such as 3D object detection, via data fusion. However, the absence of accurate and automated registration methods for estimating camera extrinsic parameters in roadside scene point clouds notably constrains the potential applications of roadside cameras. This paper proposes a novel approach for the automatic registration between prior point clouds and images from roadside scenes. The main idea involves rendering photorealistic grayscale views taken at specific perspectives from the prior point cloud with the help of their features like RGB or intensity values. These generated views can reduce the modality differences between images and prior point clouds, thereby improve the robustness and accuracy of the registration results. Particularly, we specify an efficient algorithm, named neighbor rendering, for the rendering process. Then we introduce a method for automatically estimating the initial guess using only rough guesses of camera's position. At last, we propose a procedure for iteratively refining the extrinsic parameters by minimizing the reprojection error for line features extracted from both generated and camera images using Segment Anything Model (SAM). We assess our method using a self-collected dataset, comprising eight cameras strategically positioned throughout the university campus. Experiments demonstrate our method's capability to automatically align prior point cloud with roadside camera image, achieving a rotation accuracy of 0.202 degrees and a translation precision of 0.079m. Furthermore, we validate our approach's effectiveness in visual applications by substantially improving monocular 3D object detection performance.

What problem does this paper attempt to address?

This paper aims to solve the problem of automatic registration between point clouds and images in roadside scenarios. Specifically, the paper focuses on how to estimate the extrinsic parameters of the camera in the roadside scene point cloud to achieve accurate alignment between the point cloud and the image. This problem is crucial for enhancing the capabilities of monocular cameras in downstream vision tasks (such as 3D object detection), but currently there is a lack of accurate and automated registration methods to estimate the extrinsic parameters of the camera, which limits the application potential of roadside cameras. ### Main contributions of the paper: 1. **Propose an automated image - to - point - cloud registration framework**: This framework utilizes views generated from point clouds to reduce data modality differences and achieves the best performance in real - world roadside environments. 2. **Introduce an efficient rendering method**: Called "neighbor rendering", this method can not only generate realistic views but also maintain 2D - 3D correspondence, thereby improving the accuracy of registration. 3. **Apply the state - of - the - art segmentation model SAM**: Used to extract line features from the generated images and obtain the corresponding line features in the point cloud through 2D - 3D correspondence, optimizing the extrinsic parameters to minimize the reprojection error. 4. **Verify the effectiveness of the method in practical applications**: By applying the registered image - point - cloud pairs to the roadside 3D object detection task, the practical effect of this framework is demonstrated. ### Technical details: - **Neighbor rendering**: By considering the 3D points around the ray corresponding to each pixel, filtering out the background points, and fitting a plane based on the foreground points, the corresponding 3D points are calculated. This method effectively solves the problems caused by sparse and uneven distribution of point clouds. - **Initial guess estimation**: Based on the rough camera position, by sampling camera poses, generating views, and using SuperGlue to match the views with the camera image, the initial guess is estimated. - **Extrinsic parameter optimization**: By minimizing the reprojection error between the line features extracted from the point cloud and the image, the extrinsic parameters are optimized. Using the conversion between Lie groups and Lie algebras, the optimization problem is transformed into an unconstrained optimization problem, improving the optimization efficiency. ### Experimental results: - **Registration accuracy**: On the self - collected dataset, the average translation error of this method is 0.079 meters, the rotation error is 0.202 degrees, the maximum translation error does not exceed 0.17 meters, and the maximum rotation error is approximately 0.3 degrees. - **Ground distance measurement**: In terms of ground distance measurement, the maximum error of this method is 4.75%, the median error is 1.16%, and the root - mean - square error is 1.67%, which is significantly better than the existing state - of - the - art methods. ### Application examples: - **3D object detection**: By using YOLOv7 for 2D object detection, estimating the bottom center of the detection box as the intersection of the vehicle rear end and the ground, and using the registered point cloud to obtain the 3D positions of these points, 3D object detection is achieved. Experimental results show that within the detection range of 250 meters, most measurement errors are controlled within 1.5%. In conclusion, this paper proposes an efficient and accurate automatic registration method from roadside scene images to point clouds, which significantly improves the performance of monocular cameras in downstream tasks such as 3D object detection.

Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes

A Speedy Point Cloud Registration Method Based on Region Feature Extraction in Intelligent Driving Scene

Automatic Registration of Panoramic Images and Point Clouds in Urban Large Scenes Based on Line Features

Traffic Sign Based Point Cloud Data Registration with Roadside LiDARs in Complex Traffic Environments

Point-Based Neural Scene Rendering for Street Views

VI-eye: semantic-based 3D point cloud registration for infrastructure-assisted autonomous driving

A Novel Point Cloud Registration Method for Multimedia Communication in Automated Driving Metaverse

EFGHNet: A Versatile Image-to-Point Cloud Registration Network for Extreme Outdoor Environment

Multimodal Urban Remote Sensing Image Registration Via Roadcross Triangular Feature

APR: Online Distant Point Cloud Registration Through Aggregated Point Cloud Reconstruction

Vehicle 3d Localization in Road Scenes VIA a Monocular Moving Camera

Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting

Automatic Background Construction and Object Detection Based on Roadside LiDAR

EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera

Attention-Based Road Registration for GPS-Denied UAS Navigation

Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

CenterLoc3D: Monocular 3D Vehicle Localization Network for Roadside Surveillance Cameras

Efficient Pairwise 3-D Registration of Urban Scenes via Hybrid Structural Descriptors

Automatic Registration Method of Multi-Source Point Clouds Based on Building Facades Matching in Urban Scenes

3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints

Monocular Visual Object 3D Localization in Road Scenes