Sequential Semantic Generative Communication for Progressive Text-to-Image Generation

Hyelin Nam,Jihong Park,Jinho Choi,Seong-Lyun Kim
2023-09-08
Abstract:This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models. Regarding nowadays smart applications, successful communication can be made by conveying the perceptual meaning, which we set as text prompt. Text serves as a suitable semantic representation of image data as it has evolved to instruct an image or generate image through multi-modal techniques, by being interpreted in a manner similar to human cognition. Utilizing text can also reduce the overload compared to transmitting the intact data itself. The transmitter converts objective image to text through multi-model generation process and the receiver reconstructs the image using reverse process. Each word in the text sentence has each syntactic role, responsible for particular piece of information the text contains. For further efficiency in communication load, the transmitter sequentially sends words in priority of carrying the most information until reaches successful communication. Therefore, our primary focus is on the promising design of a communication system based on image-to-text transformation and the proposed schemes for sequentially transmitting word tokens. Our work is expected to pave a new road of utilizing state-of-the-art generative models to real communication systems
Signal Processing,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently transmit image information in communication systems, especially in the case of limited bandwidth. Traditional communication methods usually directly transmit raw image data. Although this method can ensure image quality, it will occupy a large amount of bandwidth resources and is less efficient. To solve this problem, the paper proposes a new framework based on the multimodal generative model. By converting the image into text form for transmission and then reconstructing the image according to the text at the receiving end, efficient and low - load image information transmission is achieved. Specifically, the main contributions of the paper include: 1. **Proposing a new communication framework**: Utilizing the capabilities of the multimodal generative model, convert the image into text, then transmit the text through the network, and finally reconstruct the image according to the text at the receiving end. This method can significantly reduce the communication load while maintaining high - fidelity of the image. 2. **Introducing semantic - order communication**: To further improve communication efficiency, the paper proposes the concept of semantic - order communication. The sender sends the words in the text in the order of the priority of the amount of information carried by each word until an image similar to the original image is successfully reconstructed. This method not only reduces the amount of transmitted data, but also can quickly achieve a high image similarity in the early steps. 3. **Designing multiple word - sorting strategies**: To determine the sending order of words, the paper explores multiple methods, including the word - selection method based on LPIPS loss minimization, the most - attended - word - transmission method based on the attention mechanism, and the least - attended - word - transmission method. These methods each have their own advantages and disadvantages, but overall can effectively improve communication efficiency. In summary, this paper aims to explore how to use advanced multimodal generative models to design efficient communication systems, especially in terms of image - information transmission, and provides an innovative solution.