Indelible “footprints” of Inaudible Command Injection

Zhongjie Ba,Bin Gong,Yuwei Wang,Yuxuan Liu,Peng Cheng,Feng Lin,Li Lu,Kui Ren
DOI: https://doi.org/10.1109/tifs.2024.3459728
IF: 7.231
2024-01-01
IEEE Transactions on Information Forensics and Security
Abstract:Inaudible command injection transmits inaudible ultrasounds to inject adversarial speech commands into a voice assistant, therefore manipulating voice control systems (e.g., a garage door or a security camera) for illegitimate purposes. Although the attack is inaudible, we find it does leave visible “footprints". Such attack “footprints" are the side product due to the interaction between the attack signal (i.e., input) and the acoustic components (i.e., transfer function), so they reflect the hardware characteristics of the sound capture system, including the microphone diaphragm, the low-pass filter, and the analog-to-digital converter. Moreover, unlike the non-linearity distortion that is erasable with signal-shaping techniques, the “footprints" are indelible because they are unrelated to the content of injected commands. We discover two types of indelible “footprints" embedded in the recording spectrogram, namely abnormal interfering noise and abnormal demodulation. A software-based detection method and a portable detector, DolphinTag, are further designed to identify these “footprints". The software-based method achieves a detection accuracy of 99.8% on the phone models exhibiting abnormal interfering noise, and our DolphinTag achieves 100% detection accuracy which detects the ultrasound attack by actively facilitating the abnormal demodulation.
What problem does this paper attempt to address?