MSI-NeRF: Linking Omni-Depth with View Synthesis through Multi-Sphere Image aided Generalizable Neural Radiance Field

Dongyu Yan,Guanyu Huang,Fengyu Quan,Haoyao Chen
2024-07-19
Abstract:Panoramic observation using fisheye cameras is significant in virtual reality (VR) and robot perception. However, panoramic images synthesized by traditional methods lack depth information and can only provide three degrees-of-freedom (3DoF) rotation rendering in VR applications. To fully preserve and exploit the parallax information within the original fisheye cameras, we introduce MSI-NeRF, which combines deep learning omnidirectional depth estimation and novel view synthesis. We construct a multi-sphere image as a cost volume through feature extraction and warping of the input images. We further build an implicit radiance field using spatial points and interpolated 3D feature vectors as input, which can simultaneously realize omnidirectional depth estimation and 6DoF view synthesis. Leveraging the knowledge from depth estimation task, our method can learn scene appearance by source view supervision only. It does not require novel target views and can be trained conveniently on existing panorama depth estimation datasets. Our network has the generalization ability to reconstruct unknown scenes efficiently using only four images. Experimental results show that our method outperforms existing methods in both depth estimation and novel view synthesis tasks.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of depth information in panoramic image synthesis and the problem of only being able to provide three - degrees - of - freedom (3DoF) rotation rendering. Specifically, the panoramic images synthesized by traditional methods cannot retain the parallax information in the original fisheye cameras, resulting in the loss of depth information in the finally generated images, which limits their practicality in robotics and virtual reality (VR) applications. To overcome these problems, the paper proposes a method named MSI - NeRF, which combines deep - learning omnidirectional depth estimation and novel - view - synthesis techniques, aiming to achieve omnidirectional depth estimation and six - degrees - of - freedom (6DoF) view - point synthesis simultaneously. ### Main Contributions 1. **Synthesis of Omnidirectional Radiance Fields**: The paper proposes a deep - learning method for synthesizing omnidirectional radiance fields from only four fisheye input images. The traditional 2D panoramic output is extended to 3D while retaining the parallax information in the original images. 2. **Combination of Depth Estimation and Novel - View Synthesis**: By leveraging the inductive bias of the multi - spherical - image (MSI) representation, multi - task network training can be achieved with only the most common depth - data supervision. 3. **Generalization Ability across Scenes**: The pre - trained network can be generalized in different scenes. The experimental results show that this method has achieved state - of - the - art performance in both depth - estimation and novel - view - synthesis tasks, is suitable for various VR and robotics applications, can eliminate VR motion sickness, and support panoramic - video editing and 3D reconstruction. ### Method Overview 1. **Multi - Spherical - Image Construction**: Input four wide - field - of - view fisheye images, extract feature maps through a 2D CNN with shared weights, and then construct a multi - spherical - image (MSI) to aggregate multi - view images. 2. **Hybrid Neural Rendering**: Utilize NeRF for view - point interpolation and rendering, combine the geometric and appearance features extracted from the MSI volume, and the projected color information, to generate the final occupancy and color outputs. 3. **Multi - Task Supervision**: Through depth and color supervision, enable the network to be generalized in different scenes. Depth supervision provides geometric guidance, and color supervision provides texture information. ### Experimental Results - **Depth Estimation**: The experimental results on the OmniHouse and OmniThings datasets show that this method can generate fine - grained depth estimates and avoid overfitting while maintaining overall accuracy. - **Novel - View Synthesis**: The experimental results on the Replica360 dataset show that this method performs excellently in the novel - view - synthesis task, generating high - quality images with rich details and avoiding blurring and ghosting artifacts. ### Conclusion The MSI - NeRF method successfully solves the problems of lack of depth information in panoramic image synthesis and only being able to provide 3DoF rotation rendering by combining deep - learning and NeRF techniques, achieving omnidirectional depth estimation and 6DoF view - point synthesis, and has broad application prospects.