Raising The Limit Of Image Rescaling Using Auxiliary Encoding

Chenzhong Yin,Zhihong Pan,Xin Zhou,Le Kang,Paul Bogdan
2023-03-13
Abstract:Normalizing flow models using invertible neural networks (INN) have been widely investigated for successful generative image super-resolution (SR) by learning the transformation between the normal distribution of latent variable $z$ and the conditional distribution of high-resolution (HR) images gave a low-resolution (LR) input. Recently, image rescaling models like IRN utilize the bidirectional nature of INN to push the performance limit of image upscaling by optimizing the downscaling and upscaling steps jointly. While the random sampling of latent variable $z$ is useful in generating diverse photo-realistic images, it is not desirable for image rescaling when accurate restoration of the HR image is more important. Hence, in places of random sampling of $z$, we propose auxiliary encoding modules to further push the limit of image rescaling performance. Two options to store the encoded latent variables in downscaled LR images, both readily supported in existing image file format, are proposed. One is saved as the alpha-channel, the other is saved as meta-data in the image header, and the corresponding modules are denoted as suffixes -A and -M respectively. Optimal network architectural changes are investigated for both options to demonstrate their effectiveness in raising the rescaling performance limit on different baseline models including IRN and DLV-IRN.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The paper primarily aims to address the issue of high-frequency information loss during image scaling (especially upsampling) and proposes two methods to improve image rescaling models based on Invertible Neural Networks (INN). Specifically: 1. **High-Frequency Information Storage**: Traditional image upsampling methods often lose high-frequency details, resulting in poor quality of the restored high-resolution (HR) images. Although some models like IRN utilize the bidirectional nature of invertible neural networks to optimize the downsampling and upsampling steps, they generate diverse photo-realistic images by randomly sampling latent variables \( z \), which is not ideal when precise recovery of HR images is needed. 2. **Auxiliary Encoding Modules**: To address the above issue, the authors propose a new Auxiliary Encoding Module for more effectively compressing and storing high-frequency information. This includes two methods: - **IRN-A**: Adds an extra alpha channel to the output low-resolution (LR) image to store the compressed high-frequency information. - **IRN-M**: Uses an autoencoder to compress the latent variable \( z \) into a compact latent variable and saves it as metadata of the image file. 3. **Experimental Validation**: A series of experiments were conducted to validate the effectiveness of these two methods. The results show that both methods significantly improve the image quality during the upsampling process, especially in terms of storing high-frequency information, thereby enhancing the quality of the final restored HR images. In summary, this paper aims to improve the handling of high-frequency information in image rescaling models to enhance the quality of image recovery during the upsampling process.