Radar Can See and Hear as Well: A New Multimodal Benchmark Based on Radar Sensing

Yinsong Xu,Qingchao Chen
DOI: https://doi.org/10.1109/jiot.2024.3396285
IF: 10.6
2024-07-27
IEEE Internet of Things Journal
Abstract:Radar technology has emerged as a pivotal component for various applications within the Internet of Things (IoT). To promote the understanding and integration of radar sensing in developing multimodal applications, we introduce the Radar Can See and Hear (RACER) data set. This data set encompasses synchronized radar sensing, audio, and visual data. Radar, with its capability to detect vocal cord vibrations and lip movements, addresses scenarios where conventional microphone and camera setups may falter, such as through-wall or non-line-of-sight sensing. Specifically, the radar discerns and characterizes human lip and vocal cord movements in the range-Doppler domain. We employ deep neural networks to capture the inherent relationships among radar signatures, audio vocal sound, and visual lip movements during human pronunciations. We evaluate the performances of radar sensing using experiments on speech classification, cross-modality retrieval among audio, video, and radar, and cross-modality distillation from video or audio to radar. We summarize the findings and the limitations of using radar sensing in speech-related multimodal analysis applications. Our codes are available at: https://github.com/SPIresearch/RACER.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?