Toward General Cross-Modal Signal Reconstruction for Robotic Teleoperation
Yanan Chen,Ang Li,Dan Wu,Liang Zhou
DOI: https://doi.org/10.1109/tmm.2023.3312944
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:The multi-modal robotic teleoperation, as an important application in human-computer interaction (HCI), is playing a significant role in various domains such as industry, healthcare, and education. However, existing robotic teleoperation systems face significant challenges with multi-modal signals, primarily in designing a cross-modal communication architecture that caters to diverse modal requirements and ensuring high-quality cross-modal signal reconstruction even in poor network conditions. To this end, this work proposes a general cross-modal signal reconstruction scheme by taking full advantage of the correlation among different modality signals. Specifically, we first propose a scalable cross-modal communication architecture that meets the diverse needs of various modality signals using multi-modal encoding and multi-directional decoding, eliminating the need for a specialized feature extraction model. Next, we design a masked auto-encoder with discriminator assistance (MAE-D) cross-modal signal reconstruction method, which leverages the idea of generative confrontation by combining the codec for signal reconstruction with the discriminator responsible for assessing the authenticity of the reconstructed signal to achieve accurate and efficient cross-modal signal reconstruction. Finally, numerical experiments conducted on our self-built multi-modal dataset, a public dataset, and a teleoperation simulation platform demonstrate that the proposed scheme offers significant advantages in cross-modal signal reconstruction.
computer science, information systems,telecommunications, software engineering