Abstract:3D super-resolution aims to reconstruct high-fidelity 3D models from low-resolution (LR) multi-view images. Early studies primarily focused on single-image super-resolution (SISR) models to upsample LR images into high-resolution images. However, these methods often lack view consistency because they operate independently on each image. Although various post-processing techniques have been extensively explored to mitigate these inconsistencies, they have yet to fully resolve the issues. In this paper, we perform a comprehensive study of 3D super-resolution by leveraging video super-resolution (VSR) models. By utilizing VSR models, we ensure a higher degree of spatial consistency and can reference surrounding spatial information, leading to more accurate and detailed reconstructions. Our findings reveal that VSR models can perform remarkably well even on sequences that lack precise spatial alignment. Given this observation, we propose a simple yet practical approach to align LR images without involving fine-tuning or generating 'smooth' trajectory from the trained 3D models over LR images. The experimental results show that the surprisingly simple algorithms can achieve the state-of-the-art results of 3D super-resolution tasks on standard benchmark datasets, such as the NeRF-synthetic and MipNeRF-360 datasets. Project page: <a class="link-external link-https" href="https://ko-lani.github.io/Sequence-Matters" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the challenge of reconstructing high - fidelity 3D models from low - resolution (LR) multi - view images. Specifically, earlier research mainly focused on single - image super - resolution (SISR) models, which magnify low - resolution images into high - resolution ones, but often lack view - point consistency because they process each image independently. Although various post - processing techniques have been widely explored to alleviate these problems, they have not been fully resolved yet. To solve this problem, the authors propose a new method that utilizes video super - resolution (VSR) models to ensure higher spatial consistency and be able to refer to surrounding spatial information, thus achieving more accurate and detailed reconstruction. Through this method, the authors hope to align LR images without fine - tuning or generating smooth trajectories, thereby improving the performance of 3D super - resolution tasks. ### Main contributions 1. **Proposed a new method**: Utilize VSR models to bridge the gap between low - resolution and high - resolution images. By generating input video sequences that are "smooth" enough and have minimal artifacts, optimize their adaptability to VSR models. 2. **Proposed simple and effective sorting algorithms**: These algorithms show better performance than existing methods, especially when generating "video - like" sequences. 3. **Achieved state - of - the - art performance**: This method achieves the best performance on both the NeRF - Synthetic and Mip - NeRF 360 datasets, highlighting its robustness and effectiveness in object - level and scene - level datasets. ### Core idea of the solution The authors observe that VSR models can maintain strong performance even when the input video does not follow a "smooth" camera trajectory. Based on this, they propose simple sorting algorithms to arrange the training data set into structured "video - like" sequences, thereby improving the quality of VSR results while avoiding the need to fine - tune the VSR model. These "video - like" sequences are composed of real LR images, ensuring no streak or speckle artifacts, thereby improving the quality of the final output. ### Experimental results The experimental results show that the proposed algorithm achieves state - of - the - art performance on both the NeRF - Synthetic and Mip - NeRF 360 datasets. Especially in terms of quantitative metrics (such as PSNR, SSIM, and LPIPS) and qualitative results, this method significantly outperforms other baseline models, especially in preserving high - frequency details and reconstructing the real situation. In conclusion, this paper successfully solves the view - point consistency and artifact problems in 3D super - resolution tasks by introducing VSR models and innovative sorting algorithms, providing new ideas and methods for future research.

Sequence Matters: Harnessing Video Models in 3D Super-Resolution

Piecewise Planar Super-Resolution for 3D Scene

Video super-resolution with phase-aided deformable alignment network

Video super-resolution with 3D adaptive normalized convolution

Practical super-resolution from dynamic video sequences

An Efficient Algorithm for Video Super-Resolution Based On a Sequential Model

Video super-resolution via mixed spatial-temporal convolution and selective fusion

Reference-based Video Super-Resolution Using Multi-Camera Video Triplets

Benchmark Dataset and Effective Inter-Frame Alignment for Real-World Video Super-Resolution

Video Super-Resolution Via a Spatio-Temporal Alignment Network.

Enhanced Video Super-Resolution Network Towards Compressed Data

Structure-preserving video super-resolution using three-dimensional convolutional neural networks

Deformable 3D Convolution for Video Super-Resolution

Revisiting Temporal Modeling for Video Super-resolution.

Unified Single-Image and Video Super-Resolution via Denoising Algorithms

Omniscient Video Super-Resolution with Explicit-Implicit Alignment

SuperGaussian: Repurposing Video Models for 3D Super Resolution

Cascaded Temporal Updating Network for Efficient Video Super-Resolution

Spatiotemporal Super-Resolution Reconstruction Based on Robust Optical Flow and Zernike Moment for Video Sequences

Video Super-Resolution Reconstruction Based on Deep Learning and Spatio-Temporal Feature Self-similarity

Self-Learned Video Super-Resolution with Augmented Spatial and Temporal Context