Exploring the Benefits of Cross-Modal Coding

Zhe Yuan,Bin Kang,Xin Wei,Liang Zhou
DOI: https://doi.org/10.1109/tcsvt.2022.3196586
IF: 5.859
2022-12-10
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Multi-modal services, typically integrating such signals as audio, video, and haptic, will become an inevitable application trend of the 5G and beyond. However, due to the essential differences among the haptic and audio/video signals, the existing coding schemes usually fail to satisfy the critical requirements in terms of the rate distortion performance. Inspired by the phenomenon that hearing, sight and touch are highly correlated, we provide an affirmative answer by proposing the framework of cross-modal coding, which compresses multi-modal signals aided by their semantic correlation. In particular, the highlights of this work lie in addressing three fundamental technical problems: i) how to exploit the semantic correlation among different modalities, ii) to what extent of benefit we can get from cross-modal coding, and iii) how to design a general cross-modal codec. On the theoretical end, we determine the minimum number of bits required to compress haptic signals under the rate conditions of video streams through investigating their semantic correlation. On the technical end, we design a general cross-modal codec to approach the optimal compression limit by using the AI-enabled cross-modal prediction and channel coding. Numerical results demonstrate that the proposed cross-modal coding can achieve significant benefits relative to the existing schemes, especially when multi-modal signals have strong semantic correlation.
engineering, electrical & electronic
What problem does this paper attempt to address?