Language-oriented Semantic Communication for Image Transmission with Fine-Tuned Diffusion Model

Xinfeng Wei,Haonan Tong,Nuocheng Yang,Changchuan Yin
2024-09-26
Abstract:Ubiquitous image transmission in emerging applications brings huge overheads to limited wireless resources. Since that text has the characteristic of conveying a large amount of information with very little data, the transmission of the descriptive text of an image can reduce the amount of transmitted data. In this context, this paper develops a novel semantic communication framework based on a text-2-image generative model (Gen-SC). In particular, a transmitter converts the input image to textual modality data. Then the text is transmitted through a noisy channel to the receiver. The receiver then uses the received text to generate images. Additionally, to improve the robustness of text transmission over noisy channels, we designed a transformer-based text transmission codec model. Moreover, we obtained a personalized knowledge base by fine-tuning the diffusion model to meet the requirements of task-oriented transmission scenarios. Simulation results show that the proposed framework can achieve high perceptual quality with reducing the transmitted data volume by up to 99% and is robust to wireless channel noise in terms of portrait image transmission.
Multimedia
What problem does this paper attempt to address?
The paper aims to address the issue of image transmission under resource constraints (such as spectrum and power) and harsh environments, particularly the need for efficient image transmission in wireless networks. Specifically, the paper proposes a novel semantic communication framework based on a text-to-image generation model (Gen-SC), which significantly reduces the amount of data transmitted by converting input images into text data for transmission. Additionally, to enhance the robustness of text transmission in noisy channels, the researchers designed a Transformer-based text transmission codec model and fine-tuned a diffusion model to meet the needs of task-oriented transmission scenarios. Experimental results show that this framework can achieve up to 99% data reduction in portrait image transmission and demonstrates good robustness against wireless channel noise.