Learned Source and Channel Coding for Talking-Head Semantic Transmission

Weijie Yue,Jincheng Dai,Sixian Wang,Zhongwei Si,Kai Niu
DOI: https://doi.org/10.1109/WCNC55385.2023.10118851
2023-01-01
Abstract:How to efficiently transmit a special video over wireless channels? While the established systems work by combining H.26x video coding and 5G LDPC channel coding, its end-to-end transmission efficiency is still far away from the extreme for video sources in a specific domain. In this paper, we seek to design a special semantic communication system tailored for transmitting video calling streams over the wireless channels. Inspired by the recent progress in talking-head animation, we propose a talking-head semantic transmission (THST) system, which can efficiently transmit motion keypoint representation as compact semantic information to drive the free-view talk-heading synthesis at the receiver. Since the motion semantic key points are correlated, our THST system learns a nonlinear analysis transform to map the key points across multiple frames into latent space, then transmits the latent hyper semantic representation to the receiver via deep joint source-channel coding. Our system incorporates a latent prior to estimate the importance diversity on the semantic key points, accordingly, we realize variable rate joint source-channel coding to obtain system level coding gain. Extensive experimental validation shows that our THST system outperforms engineered competing systems on benchmark datasets. Moreover, due to the system level joint source and channel design, our method provides much more robust performance over noisy channels with only 33% bandwidth cost versus the current talking-head compression combined with 5G LDPC coded transmission systems.
What problem does this paper attempt to address?