Abstract:This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models. Regarding nowadays smart applications, successful communication can be made by conveying the perceptual meaning, which we set as text prompt. Text serves as a suitable semantic representation of image data as it has evolved to instruct an image or generate image through multi-modal techniques, by being interpreted in a manner similar to human cognition. Utilizing text can also reduce the overload compared to transmitting the intact data itself. The transmitter converts objective image to text through multi-model generation process and the receiver reconstructs the image using reverse process. Each word in the text sentence has each syntactic role, responsible for particular piece of information the text contains. For further efficiency in communication load, the transmitter sequentially sends words in priority of carrying the most information until reaches successful communication. Therefore, our primary focus is on the promising design of a communication system based on image-to-text transformation and the proposed schemes for sequentially transmitting word tokens. Our work is expected to pave a new road of utilizing state-of-the-art generative models to real communication systems

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently transmit image information in communication systems, especially in the case of limited bandwidth. Traditional communication methods usually directly transmit raw image data. Although this method can ensure image quality, it will occupy a large amount of bandwidth resources and is less efficient. To solve this problem, the paper proposes a new framework based on the multimodal generative model. By converting the image into text form for transmission and then reconstructing the image according to the text at the receiving end, efficient and low - load image information transmission is achieved. Specifically, the main contributions of the paper include: 1. **Proposing a new communication framework**: Utilizing the capabilities of the multimodal generative model, convert the image into text, then transmit the text through the network, and finally reconstruct the image according to the text at the receiving end. This method can significantly reduce the communication load while maintaining high - fidelity of the image. 2. **Introducing semantic - order communication**: To further improve communication efficiency, the paper proposes the concept of semantic - order communication. The sender sends the words in the text in the order of the priority of the amount of information carried by each word until an image similar to the original image is successfully reconstructed. This method not only reduces the amount of transmitted data, but also can quickly achieve a high image similarity in the early steps. 3. **Designing multiple word - sorting strategies**: To determine the sending order of words, the paper explores multiple methods, including the word - selection method based on LPIPS loss minimization, the most - attended - word - transmission method based on the attention mechanism, and the least - attended - word - transmission method. These methods each have their own advantages and disadvantages, but overall can effectively improve communication efficiency. In summary, this paper aims to explore how to use advanced multimodal generative models to design efficient communication systems, especially in terms of image - information transmission, and provides an innovative solution.

Sequential Semantic Generative Communication for Progressive Text-to-Image Generation

SIMGAN: Photo-Realistic Semantic Image Manipulation Using Generative Adversarial Networks.

Generative AI Meets Semantic Communication: Evolution and Revolution of Communication Tasks

Generative Semantic Communication for Text-to-Speech Synthesis

Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation

Image Generation with Multimodule Semantic Feature-Aided Selection for Semantic Communications

Deep Conditional Generative Semantic Communication for Image Transmission

Semantic Change Driven Generative Semantic Communication Framework

Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework

Generative Semantic Communication: Diffusion Models Beyond Bit Recovery

End-to-End Generative Semantic Communication Powered by Shared Semantic Knowledge Base

Rethinking Multi-User Semantic Communications with Deep Generative Models

Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs

Language-oriented Semantic Communication for Image Transmission with Fine-Tuned Diffusion Model

Receiver-Centric Generative Semantic Communications

Evolving Semantic Communication with Generative Model

Generative Model Based Highly Efficient Semantic Communication Approach for Image Transmission

Temporal Prompt Engineering for Generative Semantic Communication

Generative Semantic Communication: Architectures, Technologies, and Applications

Multimodal generative semantic communication based on latent diffusion model

FAST-GSC: Fast and Adaptive Semantic Transmission for Generative Semantic Communication