Synchronous Semantic Communications for Video and Speech

Yun Tian,Jingkai Ying,Zhijin Qin,Ye Jin,Xiaoming Tao
DOI: https://doi.org/10.1109/icc51166.2024.10622487
2024-01-01
Abstract:Although semantic communication has shown great performance in various types of data transmission, the problem of semantic synchronization between multimodal data has not been well studied. Semantic synchronization is a challenging issue that requires the transmitted information to be synchronized in both semantic and time domains. In this article, we propose a synchronous semantic communication system for video and speech transmission, which the real-time facial transmission is adopted as the use case. Particularly, to achieve time domain synchronization, we design an efficient semantic transmitter to send multimodal data packets. 3D Morphable Mode (3DMM) coefficients and text are employed as semantic information, achieving semantic interactivity and lower bandwidth. To address synchronization in semantic domain, we firstly employ the visual voice clone at the receiver. Visual-guided speech synthesis module is designed to align text and facial semantics. Thus, the generated speech is synchronized with video frames in both semantic and time domains. The simulation results show that our proposed system achieves high-quality synchronous transmission of video and speech with reducing transmission overhead.
What problem does this paper attempt to address?