UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation

Shuang Wu,Songlin Tang,Guangming Lu,Jianzhuang Liu,Wenjie Pei
2024-07-29
Abstract:Typical inverse rendering methods focus on learning implicit neural scene representations by modeling the geometry, materials and illumination separately, which entails significant computations for optimization. In this work we design a Unified Voxelization framework for explicit learning of scene representations, dubbed UniVoxel, which allows for efficient modeling of the geometry, materials and illumination jointly, thereby accelerating the inverse rendering significantly. To be specific, we propose to encode a scene into a latent volumetric representation, based on which the geometry, materials and illumination can be readily learned via lightweight neural networks in a unified manner. Particularly, an essential design of UniVoxel is that we leverage local Spherical Gaussians to represent the incident light radiance, which enables the seamless integration of modeling illumination into the unified voxelization framework. Such novel design enables our UniVoxel to model the joint effects of direct lighting, indirect lighting and light visibility efficiently without expensive multi-bounce ray tracing. Extensive experiments on multiple benchmarks covering diverse scenes demonstrate that UniVoxel boosts the optimization efficiency significantly compared to other methods, reducing the per-scene training time from hours to 18 minutes, while achieving favorable reconstruction quality. Code is available at <a class="link-external link-https" href="https://github.com/freemantom/UniVoxel" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the inverse rendering problem of estimating the geometry, material, and illumination properties of 3D scenes from multi - view 2D images in computer vision and graphics. Traditional inverse rendering methods usually model the geometry, material, and illumination of the scene through implicit neural representations, which require a large amount of computational resources for optimization. Specifically, these methods usually need to model each property separately, use deep MLP networks, and require expensive multi - hop ray tracing when modeling illumination, resulting in low optimization efficiency and training times as long as several hours or even days. To solve these problems, the paper proposes a unified voxelization framework (UniVoxel) for explicitly learning scene representations. This framework allows for efficient joint modeling of geometry, material, and illumination, thereby significantly accelerating the inverse rendering process. The specific contributions are as follows: 1. **Unified Voxelization Framework**: UniVoxel designs a unified voxelization framework that can efficiently learn all key scene properties, including geometry, material, and illumination. 2. **Spherical Gaussian - Based Illumination Modeling**: UniVoxel uses Spherical Gaussians (SG) to represent the local incident light radiance, eliminating the need for multi - hop ray tracing, making the illumination modeling seamlessly integrated with the modeling of other scene properties and significantly improving the training efficiency. 3. **Experimental Verification**: Extensive experiments show that UniVoxel significantly improves the optimization efficiency on multiple benchmark datasets, reducing the training time per scene from several hours to 18 minutes while maintaining good reconstruction quality. Through these innovations, UniVoxel effectively solves the problem of low optimization efficiency in traditional inverse rendering methods and provides a more efficient and feasible solution for practical applications.