AG-NeRF: Attention-guided Neural Radiance Fields for Multi-height Large-scale Outdoor Scene Rendering

Jingfeng Guo,Xiaohan Zhang,Baozhu Zhao,Qi Liu
2024-04-18
Abstract:Existing neural radiance fields (NeRF)-based novel view synthesis methods for large-scale outdoor scenes are mainly built on a single altitude. Moreover, they often require a priori camera shooting height and scene scope, leading to inefficient and impractical applications when camera altitude changes. In this work, we propose an end-to-end framework, termed AG-NeRF, and seek to reduce the training cost of building good reconstructions by synthesizing free-viewpoint images based on varying altitudes of scenes. Specifically, to tackle the detail variation problem from low altitude (drone-level) to high altitude (satellite-level), a source image selection method and an attention-based feature fusion approach are developed to extract and fuse the most relevant features of target view from multi-height images for high-fidelity rendering. Extensive experiments demonstrate that AG-NeRF achieves SOTA performance on 56 Leonard and Transamerica benchmarks and only requires a half hour of training time to reach the competitive PSNR as compared to the latest BungeeNeRF.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the limitations of existing Neural Radiance Field (NeRF) methods in rendering large-scale outdoor scenes captured at different altitudes. Specifically: - **Detail differences caused by altitude variation**: Images captured at different altitudes exhibit significant detail differences. Images taken at low altitudes (e.g., drone level) contain more high-frequency details, while images taken at high altitudes (e.g., satellite level) mainly contain low-frequency details. This makes existing NeRF methods perform poorly when handling such multi-altitude images. - **High training cost**: Existing methods like BungeeNeRF can reconstruct scenes at different altitudes but require multi-stage training, taking several days and necessitating complex parameter adjustments. - **Requirement for camera altitude prior information**: Existing methods usually need prior knowledge of the camera's altitude to segment the training dataset, which is impractical in real-world applications. To address these issues, this paper proposes an end-to-end framework called AG-NeRF. By selecting source images and using an attention-based feature fusion method, it extracts and fuses the most relevant features for the target view, achieving high-quality image rendering at different altitudes. Experimental results show that AG-NeRF not only surpasses existing methods in accuracy but also significantly reduces training time, achieving results comparable to BungeeNeRF in just half an hour.