LaMoSC: Large Language Model-Driven Semantic Communication System for Visual Transmission

Yaru Zhao,Yi Yue,Shoulu Hou,Bo Cheng,Yakun Huang
DOI: https://doi.org/10.1109/tccn.2024.3401712
2024-01-01
Abstract:The advancement of artificial intelligence (AI) has the potential to revolutionize network communication. The use of advanced feature extraction in semantic communication can enhance transmission capacity. However, relying solely on unimodal visual characteristics derived from images through these approaches may result in inaccuracies in decoding under low signal-to-noise ratio (SNR) conditions. This paper introduces LaMoSC, a semantic communication system driven by large language models (LLMs) that uses multimodal features to reconstruct raw visual information, thereby improving transmission quality. The system proposes an LLM-driven multimodal fusion semantic communication framework, which aims to expand unimodal transmission systems and enhance generalization ability. LaMoSC has designed an end-to-end encoding-decoding network that integrates visual and textual multimodal feature inputs. The design deeply integrates modal features using the attention mechanism. Comprehensive comparisons with state-of-the-art baselines across various datasets demonstrate the robustness of the proposed method, particularly highlighting its superiority in low SNR conditions. LaMoSC outperforms Deep-JSCC and multi-level semantic aware communication system (MLSC) by 5.5% and 2.6%, respectively, under low SNR conditions, such as 4 dB. Its exceptional generalization capacity sets it apart from other methods.
What problem does this paper attempt to address?