STrCNeRF: an Implicit Neural Representation Method Combining Global and Local Features from an Image for Novel View Synthesis

Jubo Chen,Xiaosheng Yu,Xiaolei Tian,Junxiang Wang,Chengdong Wu
DOI: https://doi.org/10.1109/ccdc62350.2024.10587849
2024-01-01
Abstract:Recently, Novel View Synthesis (NVS) methods based on Implicit Neural Representations (INR) have garnered widespread attention and demonstrated significant performance improvements. These methods primarily focus on how to use implicit functions to generate novel viewpoint images not present in the initial training dataset. However, it is noteworthy that current implicit approaches predominantly rely on pure Multilayer Perceptron (MLP) structures to approximate functions and train scenes, which constrains both the model's training speed and its generalization to new scenes, while overlooking the holistic and detailed information from the source views. To address these issues, we propose a hybrid information-based implicit neural network structure for novel view synthesis. Our network adopts a dual-path feature extraction architecture based on Swin Transformer and CNN, simultaneously capturing global and local features from input images. This enhancement contributes to improving the model's generalization performance in new scenes and reducing training time. Furthermore, by introducing an attention-based Feature Aggregation Module (FAM), our network excels in handling detailed features. Finally, we introduce Fourier feature functions to encode input features, enhancing high-frequency components in images and learning abstract feature representations at multiple scales. Extensive experiments confirm that our proposed network structure delivers superior qualitative and quantitative performance and higher reconstruction quality on real-world datasets and ShapeNet dataset.
What problem does this paper attempt to address?