Task-Oriented Multi-User Semantic Communications for VQA Task

Huiqiang Xie,Zhijin Qin,Geoffrey Ye Li
DOI: https://doi.org/10.48550/arXiv.2108.07357
2021-12-15
Abstract:Semantic communications focus on the transmission of semantic features. In this letter, we consider a task-oriented multi-user semantic communication system for multimodal data transmission. Particularly, partial users transmit images while the others transmit texts to inquiry the information about the images. To exploit the correlation among the multimodal data from multiple users, we propose a deep neural network enabled semantic communication system, named MU-DeepSC, to execute the visual question answering (VQA) task as an example. Specifically, the transceiver for MU-DeepSC is designed and optimized jointly to capture the features from the correlated multimodal data for task-oriented transmission. Simulation results demonstrate that the proposed MU-DeepSC is more robust to channel variations than the traditional communication systems, especially in the low signal-to-noise (SNR) regime.
Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to effectively transmit and fuse multimodal data (such as images and text) in a multi - user environment to perform specific tasks, such as Visual Question Answering (VQA). Specifically, the paper proposes a deep - learning - based semantic communication system (MU - DeepSC), aiming to capture and transmit the semantic features of multimodal data associated among multiple users by jointly designing the semantic encoder and the channel encoder. ### Specific description of the problem: 1. **Limitations of traditional communication systems**: - Traditional communication systems convert data into bit streams and require the receiving end to accurately recover these bits. This depends on good channel conditions and a high signal - to - noise ratio (SNR), and has poor performance under low SNR conditions. - Semantic communication directly transmits and recovers the meaning of the content without the need for precise bit recovery, so it is more robust to channel changes. 2. **Requirements for multimodal data**: - In actual communication scenarios, the system needs to collect, transmit, and fuse multimodal data (such as images, text, etc.) from multiple users. - Multimodal data can provide more information, introduce new degrees of freedom, and improve the performance of intelligent tasks. 3. **Task - oriented challenges**: - How to extract appropriate semantic information from each user. - How to build a model to fuse the multimodal semantic information of multiple users. ### Solutions proposed in the paper: - **MU - DeepSC framework**: - A new task - oriented multimodal data semantic communication system (MU - DeepSC) is proposed, in which the transceiver is jointly designed to perform intelligent tasks. - Taking the Visual Question Answering (VQA) task as an example, the effectiveness of MU - DeepSC is demonstrated. - **Key technologies**: - **Image transmitter**: Use ResNet - 101 for semantic encoding and CNN for channel encoding. - **Text transmitter**: Use Bi - LSTM for semantic encoding and a fully - connected layer for channel encoding. - **Receiver**: Use convolutional layers and fully - connected layers to decode image and text information, and fuse semantic information through the MAC network to answer questions. - **Optimization and training**: - Use the cross - entropy loss function (CE) to measure the difference between the correct answer and the predicted answer, thereby optimizing network parameters. - Train the entire system by the gradient descent method. ### Summary: The paper aims to solve the problem of effective transmission and fusion of multimodal data in a multi - user environment, especially to achieve task - oriented semantic communication through deep - learning techniques, so as to maintain high task accuracy even under low signal - to - noise ratio conditions.