Abstract:Semantic communications focus on the transmission of semantic features. In this letter, we consider a task-oriented multi-user semantic communication system for multimodal data transmission. Particularly, partial users transmit images while the others transmit texts to inquiry the information about the images. To exploit the correlation among the multimodal data from multiple users, we propose a deep neural network enabled semantic communication system, named MU-DeepSC, to execute the visual question answering (VQA) task as an example. Specifically, the transceiver for MU-DeepSC is designed and optimized jointly to capture the features from the correlated multimodal data for task-oriented transmission. Simulation results demonstrate that the proposed MU-DeepSC is more robust to channel variations than the traditional communication systems, especially in the low signal-to-noise (SNR) regime.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to effectively transmit and fuse multimodal data (such as images and text) in a multi - user environment to perform specific tasks, such as Visual Question Answering (VQA). Specifically, the paper proposes a deep - learning - based semantic communication system (MU - DeepSC), aiming to capture and transmit the semantic features of multimodal data associated among multiple users by jointly designing the semantic encoder and the channel encoder. ### Specific description of the problem: 1. **Limitations of traditional communication systems**: - Traditional communication systems convert data into bit streams and require the receiving end to accurately recover these bits. This depends on good channel conditions and a high signal - to - noise ratio (SNR), and has poor performance under low SNR conditions. - Semantic communication directly transmits and recovers the meaning of the content without the need for precise bit recovery, so it is more robust to channel changes. 2. **Requirements for multimodal data**: - In actual communication scenarios, the system needs to collect, transmit, and fuse multimodal data (such as images, text, etc.) from multiple users. - Multimodal data can provide more information, introduce new degrees of freedom, and improve the performance of intelligent tasks. 3. **Task - oriented challenges**: - How to extract appropriate semantic information from each user. - How to build a model to fuse the multimodal semantic information of multiple users. ### Solutions proposed in the paper: - **MU - DeepSC framework**: - A new task - oriented multimodal data semantic communication system (MU - DeepSC) is proposed, in which the transceiver is jointly designed to perform intelligent tasks. - Taking the Visual Question Answering (VQA) task as an example, the effectiveness of MU - DeepSC is demonstrated. - **Key technologies**: - **Image transmitter**: Use ResNet - 101 for semantic encoding and CNN for channel encoding. - **Text transmitter**: Use Bi - LSTM for semantic encoding and a fully - connected layer for channel encoding. - **Receiver**: Use convolutional layers and fully - connected layers to decode image and text information, and fuse semantic information through the MAC network to answer questions. - **Optimization and training**: - Use the cross - entropy loss function (CE) to measure the difference between the correct answer and the predicted answer, thereby optimizing network parameters. - Train the entire system by the gradient descent method. ### Summary: The paper aims to solve the problem of effective transmission and fusion of multimodal data in a multi - user environment, especially to achieve task - oriented semantic communication through deep - learning techniques, so as to maintain high task accuracy even under low signal - to - noise ratio conditions.

Task-Oriented Multi-User Semantic Communications for VQA Task

Dual Path Multi-Modal High-Order Features for Textual Content Based Visual Question Answering

A Unified Multi-Task Semantic Communication System with Domain Adaptation

A Unified Multi-Task Semantic Communication System for Multimodal Data

Semantic Communication Approach for Multi-Task Image Transmission

One-to-Many Semantic Communication Systems: Design, Implementation, Performance Evaluation

A Multi-Task Semantic Communication System for Natural Language Processing

Deep Learning-Enabled Semantic Communication Systems With Task-Unaware Transmitter and Dynamic Data

Multi-User Semantic Communications for Cooperative Object Identification

Deep Learning-Based Image Semantic Communication System

Semantic Communication for Multi-modal Data Transmission

Non-Orthogonal Multiple Access Enhanced Multi-User Semantic Communication

Multi-Modal Fusion-Based Multi-Task Semantic Communication System

Vector Quantized Semantic Communication System

Deep Learning Enabled Task-Oriented Semantic Communication for Memory-Limited Devices

SemHARQ: Semantic-Aware HARQ for Multi-task Semantic Communications

Task-oriented Explainable Semantic Communications

Semantic Communication for Cooperative Multi-Task Processing over Wireless Networks

QoE-based Semantic-Aware Resource Allocation for Multi-Task Networks

Mem-DeepSC: A Semantic Communication System with Memory.