Yuan Ren,Guile Wu,Runhao Li,Zheyuan Yang,Yibo Liu,Xingxin Chen,Tongtong Cao,Bingbing Liu
Abstract:Urban scene reconstruction is crucial for real-world autonomous driving simulators. Although existing methods have achieved photorealistic reconstruction, they mostly focus on pinhole cameras and neglect fisheye cameras. In fact, how to effectively simulate fisheye cameras in driving scene remains an unsolved problem. In this work, we propose UniGaussian, a novel approach that learns a unified 3D Gaussian representation from multiple camera models for urban scene reconstruction in autonomous driving. Our contributions are two-fold. First, we propose a new differentiable rendering method that distorts 3D Gaussians using a series of affine transformations tailored to fisheye camera models. This addresses the compatibility issue of 3D Gaussian splatting with fisheye cameras, which is hindered by light ray distortion caused by lenses or mirrors. Besides, our method maintains real-time rendering while ensuring differentiability. Second, built on the differentiable rendering method, we design a new framework that learns a unified Gaussian representation from multiple camera models. By applying affine transformations to adapt different camera models and regularizing the shared Gaussians with supervision from different modalities, our framework learns a unified 3D Gaussian representation with input data from multiple sources and achieves holistic driving scene understanding. As a result, our approach models multiple sensors (pinhole and fisheye cameras) and modalities (depth, semantic, normal and LiDAR point clouds). Our experiments show that our method achieves superior rendering quality and fast rendering speed for driving scene simulation.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to achieve efficient and realistic 3D scene reconstruction from multiple camera models (especially pinhole cameras and fisheye cameras) in the autonomous driving scenario. Specifically, existing methods mainly focus on pinhole cameras and ignore the simulation of fisheye cameras. Especially in the driving scenario, fisheye cameras are very important for navigation and perception tasks due to their wide field - of - view (FOV).
### Main Challenges
1. **Compatibility of Fisheye Cameras**: When directly applying the existing 3D Gaussian Splatting (3DGS) to fisheye cameras, due to the light distortion caused by lenses or mirrors, the affine transformation of the 3D Gaussian distribution fails.
2. **Integration of Multimodal Data**: It is necessary to process data from different sensors (such as pinhole cameras, fisheye cameras) and modalities (such as depth, semantics, normals, and optional LiDAR point clouds) simultaneously to achieve a comprehensive understanding of the driving scenario.
### Solutions
To solve the above problems, the authors propose the UniGaussian method, and its main contributions are as follows:
1. **New Differentiable Rendering Method**:
- A new differentiable rendering method is proposed. By a series of affine transformations (translation, rotation, and stretching), the 3D Gaussian distribution is adjusted to adapt to the fisheye camera model. This solves the compatibility problem between the 3D Gaussian Splatting method and fisheye cameras and maintains real - time rendering performance.
- The Kannala - Brandt and MEI models are used to describe the mathematical model of fisheye cameras, and the corresponding deformation formulas are derived.
2. **Unified 3D Gaussian Representation Framework**:
- A new framework is designed. By applying affine transformations to adapt to different camera models and using supervision information from different modalities to regularize the shared Gaussian distribution, a unified 3D Gaussian representation is learned.
- This framework can handle multiple sensors (pinhole cameras and fisheye cameras) and modalities (depth, semantics, normals, and optional LiDAR point clouds), achieving a comprehensive understanding of the driving scenario.
### Method Overview
- **Mathematical Model**: The mathematical models of fisheye cameras (Kannala - Brandt and MEI models) are described in detail, and the transformation formulas of the 3D Gaussian distribution after light distortion are derived.
- **3D Gaussian Position and Pose Adjustment**: The position and pose of 3D Gaussians are adjusted by rotating the camera - Gaussian center vector.
- **3D Gaussian Compression**: To adapt to the wide field - of - view of fisheye cameras, 3D Gaussians are compressed in polar and tangential directions.
- **Update of Scaling Factors and Rotation**: The covariance matrix of 3D Gaussians is updated by eigenvalue decomposition, and new scaling factors and quaternions are calculated.
### Experimental Verification
- **Geometric Error Analysis**: The effectiveness of the proposed rendering method in geometric error is verified through experiments.
- **Fisheye Camera Simulation**: Experiments are carried out on the KITTI - 360 dataset, which proves the superior performance of this method in fisheye camera simulation.
- **Multi - Camera Model Simulation**: The performance of the UniGaussian framework under multi - camera models is evaluated, showing its potential in autonomous driving scenario simulation.
Through these improvements, UniGaussian not only solves the compatibility problem of fisheye cameras but also improves the quality and speed of driving scene reconstruction, providing a more powerful tool for autonomous driving simulation.