Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models

Senran Fan,Zhicheng Bao,Chen Dong,Haotai Liang,Xiaodong Xu,Ping Zhang
2024-10-26
Abstract:The end-to-end image communication system has been widely studied in the academic community. The escalating demands on image communication systems in terms of data volume, environmental complexity, and task precision require enhanced communication efficiency, anti-noise ability and semantic fidelity. Therefore, we proposed a novel paradigm based on Semantic Feature Decomposition (SeFD) for the integration of semantic communication and large-scale visual generation models to achieve high-performance, highly interpretable and controllable image communication. According to this paradigm, a Texture-Color based Semantic Communication system of Images TCSCI is proposed. TCSCI decomposing the images into their natural language description (text), texture and color semantic features at the transmitter. During the transmission, features are transmitted over the wireless channel, and at the receiver, a large-scale visual generation model is utilized to restore the image through received features. TCSCI can achieve extremely compressed, highly noise-resistant, and visually similar image semantic communication, while ensuring the interpretability and editability of the transmission process. The experiments demonstrate that the TCSCI outperforms traditional image communication systems and existing semantic communication systems under extreme compression with good anti-noise performance and interpretability.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to address the challenges brought about by the increasing demands of image communication systems in terms of data volume, environmental complexity, and task precision. Specifically, it aims to improve three key performances of image communication systems: 1. **Communication efficiency**: that is, whether the system can be applied to various tasks with limited channel resources. 2. **Noise resistance**: that is, whether the system can support normal transmission in complex communication scenarios (such as under low signal - to - noise ratio conditions). 3. **Semantic fidelity**: that is, whether the image transmission system can protect the key semantic content in the image to enable normal task execution. To achieve these goals, the author proposes a new paradigm based on Semantic Feature Decomposition (SeFD), which combines semantic communication with large - scale visual generation models to achieve high - performance, highly interpretable, and controllable image communication. According to this paradigm, the author proposes a Texture - Color based Semantic Communication system of Images (TCSCI). The main features of TCSCI are as follows: - **High compression ratio**: By decomposing the image into natural - language descriptions, texture and color semantic features, and further compressing these features during the transmission process, an extremely high compression ratio is achieved. - **Strong noise resistance**: By using large - scale visual generation models (such as Stable Diffusion) and joint source - channel coding techniques, the noise resistance of the system is significantly improved. - **High visual similarity**: By using ControlNet to control the Stable Diffusion model and reconstruct the image based on the received semantic features, it is ensured that the reconstructed image has a high visual similarity to the original image. - **Strong interpretability and controllability**: The whole process is interpretable and controllable, and users can edit and adjust semantic features as needed. The experimental results show that TCSCI achieves image transmission with high visual similarity at an extremely low compression rate, surpassing traditional image communication systems and existing semantic communication systems, and exhibits better noise resistance performance under low signal - to - noise ratio conditions. ### Formula summary The formulas involved in the paper are as follows: 1. Definition of the semantic feature set \( S \): \[ S=\{s_{\text{text}}, s_{\text{texture}}, s_{\text{color}}\} \] where \( S \) is the set of semantic features extracted from the input image. 2. Encoding and decoding processes in module B: \[ C = E(S) \] \[ C'=C_{\text{channel}}(C) \] \[ S'=D(C') \] where \( E(\cdot) \) and \( D(\cdot) \) represent the overall encoding and decoding processes in module B respectively, and \( C \) represents the encoded set transmitted through the wireless channel. 3. Image reconstruction process: \[ I'=SD(s'_{\text{text}}, \text{Control}(s'_{\text{texture}}, s'_{\text{color}})) \] where \( SD(\cdot) \) and \( \text{Control}(\cdot) \) represent Stable Diffusion and ControlNet respectively, \( s' \) represents the corresponding semantic features after transmission through the channel, and \( I' \) represents the final generated image. The use of these formulas ensures the efficiency and accuracy of the system at each stage.