TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving

Cheng Zhao,Su Sun,Ruoyu Wang,Yuliang Guo,Jun-Jun Wan,Zhou Huang,Xinyu Huang,Yingjie Victor Chen,Liu Ren
2024-07-13
Abstract:Most 3D Gaussian Splatting (3D-GS) based methods for urban scenes initialize 3D Gaussians directly with 3D LiDAR points, which not only underutilizes LiDAR data capabilities but also overlooks the potential advantages of fusing LiDAR with camera data. In this paper, we design a novel tightly coupled LiDAR-Camera Gaussian Splatting (TCLC-GS) to fully leverage the combined strengths of both LiDAR and camera sensors, enabling rapid, high-quality 3D reconstruction and novel view RGB/depth synthesis. TCLC-GS designs a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree feature) 3D representation derived from LiDAR-camera data, to enrich the properties of 3D Gaussians for splatting. 3D Gaussian's properties are not only initialized in alignment with the 3D mesh which provides more completed 3D shape and color information, but are also endowed with broader contextual information through retrieved octree implicit features. During the Gaussian Splatting optimization process, the 3D mesh offers dense depth information as supervision, which enhances the training process by learning of a robust geometry. Comprehensive evaluations conducted on the Waymo Open Dataset and nuScenes Dataset validate our method's state-of-the-art (SOTA) performance. Utilizing a single NVIDIA RTX 3090 Ti, our method demonstrates fast training and achieves real-time RGB and depth rendering at 90 FPS in resolution of 1920x1280 (Waymo), and 120 FPS in resolution of 1600x900 (nuScenes) in urban scenarios.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem Addressed by the Paper This paper aims to address the issues of precise modeling and real-time rendering in autonomous driving scenarios. Specifically, the paper proposes a new tightly coupled LiDAR-camera Gaussian scattering (TCLC-GS) method to fully leverage the combined advantages of LiDAR and camera sensors for high-quality 3D reconstruction and novel view RGB/depth synthesis. #### Main Issues: 1. **Limitations of Existing Methods**: Most methods based on 3D Gaussian scattering (3D-GS) directly use LiDAR points to initialize 3D Gaussian distributions. This not only fails to fully utilize the capabilities of LiDAR data but also overlooks the potential advantages of fusing LiDAR and camera data. 2. **Utilization of Geometric Information**: Directly using LiDAR points to initialize the positions of 3D Gaussian distributions does not fully extract the rich geometric information in the 3D points. 3. **Modeling in Sparse Views**: Modeling and real-time rendering of large-scale urban environments in sparse views remain challenging. ### Solutions 1. **Hybrid 3D Representation**: TCLC-GS designs a 3D representation that combines explicit (colored 3D mesh) and implicit (hierarchical octree features) representations to enhance the geometric and appearance attributes of the 3D Gaussian distributions. 2. **Initialization and Optimization Process**: The geometric attributes of the 3D Gaussian distributions are initialized through the colored 3D mesh, while the appearance attributes are enhanced by retrieving octree implicit features. 3. **Dense Depth Supervision**: The dense depth rendered from the 3D mesh is used as a supervision signal, enhancing the robustness of the training process. ### Experimental Results 1. **Waymo Open Dataset**: On the Waymo dataset, TCLC-GS significantly outperforms the baseline method 3D-GS in image synthesis (PSNR, SSIM, LPIPS) and depth synthesis (AbsRel, RMSE, RMSElog). 2. **nuScenes Dataset**: On the nuScenes dataset, TCLC-GS also performs excellently in both image and depth synthesis. With these improvements, TCLC-GS achieves more accurate 3D reconstruction and high-quality novel view RGB/depth synthesis, and it can achieve real-time rendering on a single NVIDIA RTX 3090 Ti.