Exploiting Physical Presence Sensing to Secure Voice Assistant Systems.

Bang Tran,Shenhui Pan,Xiaohui Liang,Honggang Zhang
DOI: https://doi.org/10.1109/icc42927.2021.9500792
2021-01-01
Abstract:Voice Assistant System (VAS) provides a convenient way for users to interact with smart-home devices via a voice interface. However, it raises unique security issues, including voice replay and injection attacks, where attackers remotely and maliciously control the smart-home devices via a voice interface. In this paper, we consider a typical smart-home scenario in which a VAS device and a compromised speaker device are placed in close physical proximity. The attacker can remotely play malicious voice commands through the speaker device to manipulate the VAS device for malicious purposes. We propose a defense system on the VAS device to secure the VAS device against both voice replay and injection attacks, without any additional devices and without any extra user effort. Specifically, our system aims to collect voice data and wireless data continuously from the VAS device and then extracts the Mel-Cepstral Frequency Coefficients (MFCC) features from voice and wireless data. We consider that both voice and wireless data are affected by the same present users’ physical activities, and the correlation can be used to detect the attacks. Finally, our system applies a deep learning model that learns from previous time-series data and analyzes real-time data to infer whether the real-time voice command is generated from a user or the speaker device. We have tested our system in certain real-world smart-home scenarios. Our experiments showed that the proposed system has a probability between 76.4% to 89.1% to successfully detect the voice replay and injection attacks in the considered scenarios.
What problem does this paper attempt to address?