Abstract:Super-resolution (SR) and image generation are important tasks in computer vision and are widely adopted in real-world applications. Most existing methods, however, generate images only at fixed-scale magnification and suffer from over-smoothing and artifacts. Additionally, they do not offer enough diversity of output images nor image consistency at different scales. Most relevant work applied Implicit Neural Representation (INR) to the denoising diffusion model to obtain continuous-resolution yet diverse and high-quality SR results. Since this model operates in the image space, the larger the resolution of image is produced, the more memory and inference time is required, and it also does not maintain scale-specific consistency. We propose a novel pipeline that can super-resolve an input image or generate from a random noise a novel image at arbitrary scales. The method consists of a pretrained auto-encoder, a latent diffusion model, and an implicit neural decoder, and their learning strategies. The proposed method adopts diffusion processes in a latent space, thus efficient, yet aligned with output image space decoded by MLPs at arbitrary scales. More specifically, our arbitrary-scale decoder is designed by the symmetric decoder w/o up-scaling from the pretrained auto-encoder, and Local Implicit Image Function (LIIF) in series. The latent diffusion process is learnt by the denoising and the alignment losses jointly. Errors in output images are backpropagated via the fixed decoder, improving the quality of output images. In the extensive experiments using multiple public benchmarks on the two tasks i.e. image super-resolution and novel image generation at arbitrary scales, the proposed method outperforms relevant methods in metrics of image quality, diversity and scale consistency. It is significantly better than the relevant prior-art in the inference speed and memory usage.

What problem does this paper attempt to address?

The paper aims to address several key issues in image super-resolution (SR) and image generation tasks: 1. **Multi-scale Generation**: Existing methods typically generate high-resolution images or new images at a fixed scale, unable to achieve arbitrary scale image generation or super-resolution. 2. **Image Quality and Diversity**: Current methods often produce images with excessive smoothing and loss of details, and the diversity of output images and consistency across different scales are insufficient. 3. **Efficiency and Memory Consumption**: Some diffusion model-based methods can generate high-quality images but perform poorly in terms of inference speed and memory usage. To address these issues, the authors propose a new framework that combines the Latent Diffusion Model (LDM) and the Local Implicit Image Function (LIIF) decoder. This approach can efficiently generate high-quality, diverse images at arbitrary scales and shows better consistency in super-resolution tasks. Additionally, the method reduces the error between the latent space and image space during training through a two-stage alignment process, thereby improving the quality of the output images. Experimental results demonstrate that this method outperforms existing related methods on various benchmark datasets, with significant improvements in image quality and inference speed.

Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder

Efficient Model Agnostic Approach for Implicit Neural Representation Based Arbitrary-Scale Image Super-Resolution

Enhanced Implicit Function-Based Network for Arbitrary-Scale Image Super-Resolution

Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images

Implicit Diffusion Models for Continuous Super-Resolution

Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling

ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution

Image Superresolution using Scale-Recurrent Dense Network

Dynamic Implicit Image Function for Efficient Arbitrary-Scale Image Representation

Deep Arbitrary-Scale Image Super-Resolution Via Scale-Equivariance Pursuit

Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

Implicit Grid Convolution for Multi-Scale Image Super-Resolution

Single Image Super-Resolution via a Dual Interactive Implicit Neural Network

UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution

Arbitrary scale super-resolution diffusion model for brain MRI images

Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRI

Arbitrary-Scale Image Super-Resolution via Degradation Perception

Single image super-resolution with denoising diffusion GANS

Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution