Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes

Yu Sheng,Lu Zhang,Xingchen Li,Yifan Duan,Yanyong Zhang,Yu Zhang,Jianmin Ji
2024-04-08
Abstract:Prior point cloud provides 3D environmental context, which enhances the capabilities of monocular camera in downstream vision tasks, such as 3D object detection, via data fusion. However, the absence of accurate and automated registration methods for estimating camera extrinsic parameters in roadside scene point clouds notably constrains the potential applications of roadside cameras. This paper proposes a novel approach for the automatic registration between prior point clouds and images from roadside scenes. The main idea involves rendering photorealistic grayscale views taken at specific perspectives from the prior point cloud with the help of their features like RGB or intensity values. These generated views can reduce the modality differences between images and prior point clouds, thereby improve the robustness and accuracy of the registration results. Particularly, we specify an efficient algorithm, named neighbor rendering, for the rendering process. Then we introduce a method for automatically estimating the initial guess using only rough guesses of camera's position. At last, we propose a procedure for iteratively refining the extrinsic parameters by minimizing the reprojection error for line features extracted from both generated and camera images using Segment Anything Model (SAM). We assess our method using a self-collected dataset, comprising eight cameras strategically positioned throughout the university campus. Experiments demonstrate our method's capability to automatically align prior point cloud with roadside camera image, achieving a rotation accuracy of 0.202 degrees and a translation precision of 0.079m. Furthermore, we validate our approach's effectiveness in visual applications by substantially improving monocular 3D object detection performance.
Robotics
What problem does this paper attempt to address?
This paper aims to solve the problem of automatic registration between point clouds and images in roadside scenarios. Specifically, the paper focuses on how to estimate the extrinsic parameters of the camera in the roadside scene point cloud to achieve accurate alignment between the point cloud and the image. This problem is crucial for enhancing the capabilities of monocular cameras in downstream vision tasks (such as 3D object detection), but currently there is a lack of accurate and automated registration methods to estimate the extrinsic parameters of the camera, which limits the application potential of roadside cameras. ### Main contributions of the paper: 1. **Propose an automated image - to - point - cloud registration framework**: This framework utilizes views generated from point clouds to reduce data modality differences and achieves the best performance in real - world roadside environments. 2. **Introduce an efficient rendering method**: Called "neighbor rendering", this method can not only generate realistic views but also maintain 2D - 3D correspondence, thereby improving the accuracy of registration. 3. **Apply the state - of - the - art segmentation model SAM**: Used to extract line features from the generated images and obtain the corresponding line features in the point cloud through 2D - 3D correspondence, optimizing the extrinsic parameters to minimize the reprojection error. 4. **Verify the effectiveness of the method in practical applications**: By applying the registered image - point - cloud pairs to the roadside 3D object detection task, the practical effect of this framework is demonstrated. ### Technical details: - **Neighbor rendering**: By considering the 3D points around the ray corresponding to each pixel, filtering out the background points, and fitting a plane based on the foreground points, the corresponding 3D points are calculated. This method effectively solves the problems caused by sparse and uneven distribution of point clouds. - **Initial guess estimation**: Based on the rough camera position, by sampling camera poses, generating views, and using SuperGlue to match the views with the camera image, the initial guess is estimated. - **Extrinsic parameter optimization**: By minimizing the reprojection error between the line features extracted from the point cloud and the image, the extrinsic parameters are optimized. Using the conversion between Lie groups and Lie algebras, the optimization problem is transformed into an unconstrained optimization problem, improving the optimization efficiency. ### Experimental results: - **Registration accuracy**: On the self - collected dataset, the average translation error of this method is 0.079 meters, the rotation error is 0.202 degrees, the maximum translation error does not exceed 0.17 meters, and the maximum rotation error is approximately 0.3 degrees. - **Ground distance measurement**: In terms of ground distance measurement, the maximum error of this method is 4.75%, the median error is 1.16%, and the root - mean - square error is 1.67%, which is significantly better than the existing state - of - the - art methods. ### Application examples: - **3D object detection**: By using YOLOv7 for 2D object detection, estimating the bottom center of the detection box as the intersection of the vehicle rear end and the ground, and using the registered point cloud to obtain the 3D positions of these points, 3D object detection is achieved. Experimental results show that within the detection range of 250 meters, most measurement errors are controlled within 1.5%. In conclusion, this paper proposes an efficient and accurate automatic registration method from roadside scene images to point clouds, which significantly improves the performance of monocular cameras in downstream tasks such as 3D object detection.