Defending and Detecting Audio Adversarial Example Using Frame Offsets.

Yongkang Gong,Diqun Yan,Terui Mao,Donghua Wang,Rangding Wang
DOI: https://doi.org/10.3837/tiis.2021.04.019
2021-01-01
KSII Transactions on Internet and Information Systems
Abstract:Machine learning models are vulnerable to adversarial examples generated by adding a deliberately designed perturbation to a benign sample. Particularly, for automatic speech recognition (ASR) system, a benign audio which sounds normal could be decoded as a harmful command due to potential adversarial attacks. In this paper, we focus on the countermeasures against audio adversarial examples. By analyzing the characteristics of ASR systems, we find that frame offsets with silence clip appended at the beginning of an audio can degenerate adversarial perturbations to normal noise. For various scenarios, we exploit frame offsets by different strategies such as defending, detecting and hybrid strategy. Compared with the previous methods, our proposed method can defense audio adversarial example in a simpler, more generic and efficient way. Evaluated on three state-of-the-arts adversarial attacks against different ASR systems respectively, the experimental results demonstrate that the proposed method can effectively improve the robustness of ASR systems.
What problem does this paper attempt to address?