Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering

Yinqiu Liu,Hongyang Du,Dusit Niyato,Jiawen Kang,Zehui Xiong,Shiwen Mao,Ping Zhang,Xuemin Shen
2024-04-22
Abstract:Employing massive Mobile AI-Generated Content (AIGC) Service Providers (MASPs) with powerful models, high-quality AIGC services can become accessible for resource-constrained end users. However, this advancement, referred to as mobile AIGC, also introduces a significant challenge: users should download large AIGC outputs from the MASPs, leading to substantial bandwidth consumption and potential transmission failures. In this paper, we apply cross-modal Generative Semantic Communications (G-SemCom) in mobile AIGC to overcome wireless bandwidth constraints. Specifically, we utilize a series of cross-modal attention maps to indicate the correlation between user prompts and each part of AIGC outputs. In this way, the MASP can analyze the prompt context and filter the most semantically important content efficiently. Only semantic information is transmitted, with which users can recover the entire AIGC output with high quality while saving mobile bandwidth. Since the transmitted information not only preserves the semantics but also prompts the recovery, we formulate a joint semantic encoding and prompt engineering problem to optimize the bandwidth allocation among users. Particularly, we present a human-perceptual metric named Joint Perpetual Similarity and Quality (JPSQ), which is fused by two learning-based measurements regarding semantic similarity and aesthetic quality, respectively. Furthermore, we develop the Attention-aware Deep Diffusion (ADD) algorithm, which learns attention maps and leverages the diffusion process to enhance the environment exploration ability. Extensive experiments demonstrate that our proposal can reduce the bandwidth consumption of mobile users by 49.4% on average, with almost no perceptual difference in AIGC output quality. Moreover, the ADD algorithm shows superior performance over baseline DRL methods, with 1.74x higher overall reward.
Networking and Internet Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to reduce bandwidth consumption while ensuring high - quality AIGC output in Mobile AI - Generated Content (Mobile AIGC). Specifically, although existing mobile AIGC methods can use powerful AIGC models to provide high - quality services for resource - constrained end - users, these services require users to download a large amount of AIGC output from Mobile AIGC Service Providers (MASP), resulting in significant bandwidth consumption and potential transmission failure problems. These problems not only affect the user experience but also increase the user's traffic cost. To solve the above problems, the paper proposes the Cross - Modal Generative Semantic Communications (G - SemCom) framework. This framework overcomes wireless bandwidth limitations in the following ways: 1. **Cross - Modal Attention Maps**: During the process of generating AIGC output, MASP uses a series of cross - modal attention maps to represent the association between user prompts and each part of the AIGC output. In this way, MASP can analyze the context of the prompts and efficiently filter out the most semantically important content. 2. **Semantic Information Transmission**: Only transmit semantic information instead of the complete AIGC output. The user - side can recover the entire AIGC output through a lightweight decoder, thus saving mobile bandwidth while maintaining high - quality output. 3. **Joint Semantic Encoding and Prompt Engineering**: In order to optimize bandwidth allocation, the paper proposes a method of joint semantic encoding and prompt engineering. This method aims to maximize semantic similarity and output quality simultaneously while saving wireless bandwidth. For this purpose, the authors define a new perceptual metric - Joint Perpetual Similarity and Quality (JPSQ), and implement the optimization through the Attention - aware Deep Diffusion (ADD) algorithm. Through experimental verification, this method can reduce bandwidth consumption by an average of 49.4% while hardly reducing the quality of AIGC output. In addition, the ADD algorithm is significantly superior to the baseline Deep Reinforcement Learning (DRL) method in terms of convergence speed and bandwidth allocation efficiency.