Zhi-Hao Lin,Bohan Liu,Yi-Ting Chen,Kuan-Sheng Chen,David Forsyth,Jia-Bin Huang,Anand Bhattad,Shenlong Wang
Abstract:We present UrbanIR (Urban Scene Inverse Rendering), a new inverse graphics model that enables realistic, free-viewpoint renderings of scenes under various lighting conditions with a single video. It accurately infers shape, albedo, visibility, and sun and sky illumination from wide-baseline videos, such as those from car-mounted cameras, differing from NeRF's dense view settings. In this context, standard methods often yield subpar geometry and material estimates, such as inaccurate roof representations and numerous 'floaters'. UrbanIR addresses these issues with novel losses that reduce errors in inverse graphics inference and rendering artifacts. Its techniques allow for precise shadow volume estimation in the original scene. The model's outputs support controllable editing, enabling photorealistic free-viewpoint renderings of night simulations, relit scenes, and inserted objects, marking a significant improvement over existing state-of-the-art methods.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
The paper "UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video" aims to address the problem of inverse rendering for large-scale, unbounded urban scenes from a single video. Specifically, the paper proposes a novel inverse graphics model—UrbanIR (Urban Scene Inverse Rendering), which can generate realistic free-viewpoint rendered images under various lighting conditions from a single video.
### Main Contributions
1. **Inverse Rendering and Relighting**:
- UrbanIR can recover intrinsic properties of the scene such as shape, albedo, visibility, and sunlight and sky illumination from a single video.
- Compared to existing methods, UrbanIR significantly reduces artifacts in sparse view settings, especially when dealing with geometric structures like rooftops.
2. **Nighttime Simulation**:
- A physics-based nighttime simulation framework is proposed, capable of generating controllable, realistic, physically plausible, and consistent nighttime scene simulations from a single daytime video.
- This is the first method to achieve free-viewpoint nighttime simulation from a single daytime video.
3. **Large-Scale Scene Handling**:
- UrbanIR is the first application capable of inverse rendering and relighting from a single monocular video in large-scale outdoor scenes.
- Unlike other methods, UrbanIR does not require additional information such as multiple light sources, depth sensing, or LiDAR.
### Method Overview
- **Scene Representation**:
- Uses a spatial hash-based voxel NeRF representation (Instant-NGP) to encode the scene's albedo, normals, semantics, and visibility.
- The scene model implicitly encodes the geometry of the scene, providing compact and consistent appearance modeling.
- **Lighting Model**:
- Employs a parameterized sun-sky model to encode outdoor lighting, including sun color, azimuth, and zenith angles, as well as ambient light and sky texture.
- **Rendering**:
- Projects the intrinsic properties and visibility maps of the scene onto the image plane through volumetric rendering, then generates the final result through a shading process.
- Uses the Blinn-Phong model combined with sun and sky illumination terms for shading, ensuring realistic shadow effects under different lighting conditions.
- **Inverse Graphics**:
- Trains the scene model and lighting model by jointly optimizing all properties (including rendering loss, visibility loss, normal loss, semantic loss, and regularization loss).
- Visibility loss guides the recovery of geometric structures, improving shadow synthesis effects.
### Applications
- **Changing Sun Angle**: The position of the sun can be adjusted to generate scene images at different times.
- **Day-Night Transition**: New light sources (such as street lamps and car headlights) can be inserted at night to generate realistic nighttime scenes.
- **Object Insertion**: Virtual objects can be inserted into the scene, generating realistic shadow effects.
### Experimental Results
- **Datasets**: Evaluations were conducted on KITTI-360 and Waymo Open Dataset.
- **Baseline Comparison**: Compared with methods like FEGR, Instruct NeRF2NeRF, NeRF-OSR, and RelightNet, UrbanIR performed excellently across various metrics.
- **Quantitative Evaluation**: UrbanIR outperformed other methods in PSNR, SSIM, and LPIPS metrics under novel view synthesis and novel lighting conditions.
- **Qualitative Evaluation**: Demonstrated the effectiveness of UrbanIR in removing existing shadows, changing shadows on building surfaces, and modifying sky textures.
### Limitations
- The optimization process may be affected by noise predictions from prior models, requiring careful adjustment of the loss functions.
- In some cases, handling shadows remains challenging, especially when dealing with complex urban driving sequences.
Overall, UrbanIR has made significant progress in the field of inverse rendering and relighting, particularly excelling in handling large-scale urban scenes.