Rate-Distortion-Perception Controllable Joint Source-Channel Coding for High-Fidelity Generative Communications

Kailin Tan,Jincheng Dai,Zhenyu Liu,Sixian Wang,Xiaoqi Qin,Wenjun Xu,Kai Niu,Ping Zhang
2024-08-26
Abstract:End-to-end image transmission has recently become a crucial trend in intelligent wireless communications, driven by the increasing demand for high bandwidth efficiency. However, existing methods primarily optimize the trade-off between bandwidth cost and objective distortion, often failing to deliver visually pleasing results aligned with human perception. In this paper, we propose a novel rate-distortion-perception (RDP) jointly optimized joint source-channel coding (JSCC) framework to enhance perception quality in human communications. Our RDP-JSCC framework integrates a flexible plug-in conditional Generative Adversarial Networks (GANs) to provide detailed and realistic image reconstructions at the receiver, overcoming the limitations of traditional rate-distortion optimized solutions that typically produce blurry or poorly textured images. Based on this framework, we introduce a distortion-perception controllable transmission (DPCT) model, which addresses the variation in the perception-distortion trade-off. DPCT uses a lightweight spatial realism embedding module (SREM) to condition the generator on a realism map, enabling the customization of appearance realism for each image region at the receiver from a single transmission. Furthermore, for scenarios with scarce bandwidth, we propose an interest-oriented content-controllable transmission (CCT) model. CCT prioritizes the transmission of regions that attract user attention and generates other regions from an instance label map, ensuring both content consistency and appearance realism for all regions while proportionally reducing channel bandwidth costs. Comprehensive experiments demonstrate the superiority of our RDP-optimized image transmission framework over state-of-the-art engineered image transmission systems and advanced perceptual methods.
Image and Video Processing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of end - to - end image transmission in intelligent wireless communication, especially how to efficiently transmit large - media data in the case of limited bandwidth. Existing methods mainly optimize the trade - off between bandwidth cost and objective distortion, but often fail to provide visual effects consistent with human perception. Specifically: 1. **Limitations of traditional methods**: - **Low bandwidth efficiency**: Traditional communication systems are based on the source - channel separation paradigm, using rate - distortion theory for source coding and channel coding theory for transmission. Although these systems can minimize the source data size and ensure reliable transmission over noisy channels, they may lead to significant bandwidth waste. - **Poor perceptual quality**: Existing methods usually only optimize a single distortion metric (such as mean - square error), resulting in blurry reconstructed images or poor textures, which cannot meet the high - fidelity visual requirements. 2. **Research objectives**: - **Improve perceptual quality**: In order to improve the perceptual quality in wireless image transmission while ensuring limited distortion, the authors propose a new joint source - channel coding (JSCC) framework, namely the rate - distortion - perception (RDP) joint optimization framework. This framework overcomes the limitations of traditional methods by integrating flexible conditional generative adversarial networks (GANs) to provide detailed and realistic image reconstruction. - **Cope with different bandwidth conditions**: In response to the personalized needs under different bandwidth conditions, two specific models are proposed: - **Distortion - perception - controllable transmission (DPCT) model**: Applicable in the case of sufficient bandwidth, it can flexibly control the level of detail in each image area to meet different user needs. - **Interest - oriented content - controllable transmission (CCT) model**: Applicable in the case of scarce bandwidth, it gives priority to transmitting the areas of user interest and generates other areas from instance label maps to ensure content consistency and appearance realism. 3. **Innovations**: - **First realization of RDP joint optimization**: This is the first JSCC framework to realize the joint optimization of rate - distortion - perception, which can better adapt to the complex requirements in practical applications. - **Flexible perceptual enhancement module**: By introducing a lightweight spatial realism embedding module (SREM), it is possible to customize the appearance realism of each image area at the receiving end, thereby achieving personalized image reconstruction. ### Summary The main purpose of this paper is to develop a new RDP - JSCC framework to improve the perceptual quality in wireless image transmission while ensuring limited distortion. By combining deep - learning techniques and generative adversarial networks, this framework can flexibly optimize the transmission performance under different bandwidth conditions, providing a more efficient and higher - quality solution for intelligent wireless communication.