Expanding the Effective Receptive Field for Learned Image Compression

Yunhui Shi,Yalong Su,Jin Wang,Nam Ling,Baocai Yin
DOI: https://doi.org/10.1109/mmsp61759.2024.10743718
2024-01-01
Abstract:The Transformer architecture has surpassed traditional CNN-based methods in the field of learned image compression, primarily due to its expansion of the receptive field. In learned image compression, the effective receptive field is crucial. Although the Transformer theoretically has an extensive receptive field, in image compression models, its effective receptive field is much smaller than the theoretical value, accompanied by higher computational costs. To address this challenge, this paper proposes an innovative Multi-scale Spatial Channel Fusion(MSCF) mechanism that not only brings the effective receptive field of CNNs on par with Transformers but also retains the low complexity and high efficiency of CNNs. Additionally, learned image compression tends to lose a significant amount of high-frequency components. To compensate for this deficiency, we introduce a High-Frequency Enhancement(HFE) module. We integrate the MSCF mechanism and the HFE module into the MLIC++ framework. Experimental results indicate that our proposed model, Multiscale Feature Extraction and High-Frequency Enhancement for Learned Image Compression (HMLIC) achieves a substantial performance improvement over the baseline model across the Kodak, CLIC Professional Validation and Tecnick test datasets, while incurring only a minimal increase in model complexity.
What problem does this paper attempt to address?