Abstract:Deep learning-based methods have garnered significant attention in remote sensing (RS) image compression due to their superior performance. Most of these methods focus on enhancing the coding capability of the compression network and improving entropy model prediction accuracy. However, they typically compress and decompress each image independently, ignoring the significant inter-image similarity prior. In this paper, we propose a codebook-based RS image compression (Code-RSIC) method with a generated discrete codebook, which is deployed at the decoding end of a compression algorithm to provide inter-image similarity prior. Specifically, we first pretrain a high-quality discrete codebook using the competitive generation model VQGAN. We then introduce a Transformer-based prediction model to align the latent features of the decoded images from an existing compression algorithm with the frozen high-quality codebook. Finally, we develop a hierarchical prior integration network (HPIN), which mainly consists of Transformer blocks and multi-head cross-attention modules (MCMs) that can query hierarchical prior from the codebook, thus enhancing the ability of the proposed method to decode texture-rich RS images. Extensive experimental results demonstrate that the proposed Code-RSIC significantly outperforms state-of-the-art traditional and learning-based image compression algorithms in terms of perception quality. The code will be available at \url{<a class="link-external link-https" href="https://github.com/mlkk518/Code-RSIC/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to use the inter - image similarity prior to improve the perceptual quality of low - bit - rate remote sensing image compression**. Specifically, although the existing deep - learning - based remote sensing image compression methods have improved in terms of coding ability and entropy model prediction accuracy, they usually compress and decompress each image independently, ignoring the inter - image similarity prior. This ignorance leads to poor perceptual quality of the decoded remote sensing images at low bit - rates, especially in texture - rich areas. To solve this problem, the author proposes a codebook - based remote sensing image compression method (Code - RSIC), which generates a discrete codebook to provide the inter - image similarity prior and uses this prior information at the decoding end to enhance the decoding performance. The specific steps are as follows: 1. **Codebook Learning (Stage I)**: - Use VQGAN to pre - train a high - quality discrete codebook. - Optimize the codebook by minimizing the reconstruction loss, the perceptual loss, and the adversarial loss. 2. **Transformer - based Codebook Lookup (Stage II)**: - Introduce a Transformer module to predict the code sequence from the decoded low - quality features. - Use the cross - entropy loss and the L2 loss to train and fine - tune the Transformer module and the encoder. 3. **Hierarchical Prior Integration Network (Stage III)**: - Build a hierarchical prior integration network (HPIN) containing Transformer blocks and multi - head cross - attention modules (MCMs) to query the hierarchical prior information in the codebook. - Improve the quality of the decoded image by fusing the intermediate features and the codebook prior information. Through these steps, Code - RSIC can significantly improve the perceptual quality of remote sensing images at low bit - rates, surpassing the existing traditional and learning - based image compression algorithms. ### Summary of Key Formulas - **Quantization Operation**: \[ F_c(u,v)=\arg\min_{c_n\in C}\|F_h(u,v)-c_n\| \] \[ S(u,v)=\arg\min_n\|F_h(u,v)-c_n\| \] - **Loss Function**: - Codebook Learning Stage: \[ L_{s1}=L_{rec}+L_{per}+L_{cl}+\lambda_1L_{adv} \] where, \[ L_{cl}=\|SG[F_h]-F_c\|^2+\alpha\|F_h - SG[F_c]\|^2 \] \[ \lambda_1=\frac{\|\nabla_{D_H}[L_{rec}]\|}{\|\nabla_{D_H}[L_{adv}]\|+\epsilon} \] - Codebook Lookup Stage: \[ L_{s2}=L_{qf}+\lambda_2L_{ce} \] where, \[ L_{ce}=\sum_{n = 0}^{N - 1}-S_n\log(\hat{S}_n) \] \[ L_{qf}=\|F_l - SG(F_c)\|^2 \] - Hierarchical Prior Integration Stage: \[ L_{s3}=L_{s2}+L'_{rec}+L'_{per}+\lambda_3L'_{adv} \] where, \[ \lambda_3=\frac{\|\nabla_{DP}[L'_{rec}]\|}{\|\nabla_{DP}[L'_{adv}]\|+\epsilon}

Exploiting Inter-Image Similarity Prior for Low-Bitrate Remote Sensing Image Compression

Remote Sensing Image Compression Based on High-Frequency and Low-Frequency Components

Remote Sensing Image Coding for Machines on Semantic Segmentation via Contrastive Learning

Object-Fidelity Remote Sensing Image Compression With Content-Weighted Bitrate Allocation and Patch-Based Local Attention

Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering

Remote-Sensing Image Compression Using Priori-Information and Feature Registration

Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression

Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates

Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural Network

Co-Compression via Superior Gene for Remote Sensing Scene Classification

Probability Prediction Network With Checkerboard Prior for Lossless Remote Sensing Image Compression

Enhanced Remote Sensing Image Compression Method Using Large Network with Sparse Extracting Strategy

Learning-Based Scalable Image Compression With Latent-Feature Reuse and Prediction

Semantic Scalable Image Compression with Cross-Layer Priors.

Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression

Hyperspectral Image Compression Via Cross-Channel Contrastive Learning.

Spatial-Temporal Context Model for Remote Sensing Imagery Compression

A Deep Image Coding Scheme With Generative Network to Learn From Correlated Images

Prior-Information-Based Remote Sensing Image Compression with Bayesian Dictionary Learning.

Object-Based Image Coding: A Learning-Driven Revisit

Neural Image Compression Using Masked Sparse Visual Representation