Tactile Codec with Visual Assistance in Multi-modal Communication for Digital Health

Mingkai Chen,Xinmeng Tan,Huiyan Han,Lei Wang
DOI: https://doi.org/10.1007/s11036-024-02294-z
2024-02-25
Mobile Networks and Applications
Abstract:In the digital health, with the development of communication, medical information in all modalities is growing exponentially. Therefore, an effective communication for multi-modal data including tactile and visual information is paramount. In this paper, we propose a novel method to compress the tactile video data from GelSight sensors for the applications of digital health. Firstly, our method combines the visual and tactile modalities to extract the saliency information for the tactile videos. A target recognition network is designed as the visual assistance, which helps tactile videos to extract the effective information frames by recognizing whether objects are touching or not. Secondly, we design a special coding for inter- and intra-frame prediction to further extract the saliency information and compress the tactile signal. Intra-frame prediction utilizes a dynamic group of pictures (GOP) strategy to reduce time redundancy. And intra-frame prediction based on low-rank sparse decomposition (LRSD) is then used to further achieve efficient compression. Finally, Through extensive evaluation of metrics, such as peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and learned perceptual image patch similarity (LPIPS), our method obtains the better results than advanced video coding (AVC) and high efficiency video coding (HEVC). Our method achieves an average bitrate savings of 23.6% compared to HEVC and 61.4% compared to AVC. The results show that the proposed method can greatly compress the amount of haptic data with high reconstruction quality.
computer science, information systems,telecommunications, hardware & architecture
What problem does this paper attempt to address?