Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise

Mingyu Dong,Diqun Yan,Yongkang Gong,Rangding Wang
DOI: https://doi.org/10.48550/arXiv.2108.13562
2021-10-17
Abstract:An automatic speech recognition (ASR) system based on a deep neural network is vulnerable to attack by an adversarial example, especially if the command-dependent ASR fails. A defense method against adversarial examples is proposed to improve the robustness and security of the ASR system. We propose an algorithm of devastation and detection on adversarial examples that can attack current advanced ASR systems. We choose an advanced text- and command-dependent ASR system as our target, generating adversarial examples by an optimization-based attack on text-dependent ASR and the GA-based algorithm on command-dependent ASR. The method is based on input transformation of adversarial examples. Different random intensities and kinds of noise are added to adversarial examples to devastate the perturbation previously added to normal examples. Experimental results show that the method performs well. For the devastation of examples, the original speech similarity after adding noise can reach 99.68%, the similarity of adversarial examples can reach zero, and the detection rate of adversarial examples can reach 94%.
Sound,Cryptography and Security,Multimedia,Audio and Speech Processing
What problem does this paper attempt to address?
The paper attempts to address the security and robustness issues of Automatic Speech Recognition (ASR) systems when faced with adversarial sample attacks. Specifically, ASR systems based on deep neural networks are susceptible to adversarial samples, especially in command-dependent ASR systems. To enhance the security and robustness of ASR systems, the paper proposes a method to disrupt and detect adversarial samples by adding random noise. ### Main Research Content 1. **Generation of Adversarial Samples**: - The paper uses two methods to generate adversarial samples: - **Optimization Method (OPT)**: Based on gradient optimization, suitable for text-dependent ASR systems. - **Genetic Algorithm (GA)**: Suitable for command-dependent ASR systems. 2. **Disruption of Adversarial Samples**: - A method is proposed to disrupt adversarial samples by adding random noise to the input signal. The specific steps include: - Generate adversarial samples \( x^* = x + \delta^* \). - Add Gaussian noise to the adversarial samples \( \hat{x}^* = x^* + \hat{\delta} \). - By adjusting the intensity of the noise, the perturbation of the adversarial samples loses specificity, thereby losing its attack capability. 3. **Detection of Adversarial Samples**: - Based on the disruption method, a strategy for detecting adversarial samples is proposed. The specific steps include: - Add random noise \( \hat{\delta} \) to the input sample \( x \). - Compare the change rate (CR) of the recognition results before and after adding noise. - If the change rate exceeds a certain threshold \( K \), the sample is determined to be an adversarial sample. ### Experimental Results 1. **Disruption Effect**: - Experimental results show that adding random noise of appropriate intensity can significantly reduce the similarity of adversarial samples while having a minimal impact on normal samples. For example, in the TIMIT and LibriSpeech databases, when the noise intensity is 50, the similarity of adversarial samples drops to 0%, while the similarity of normal samples remains high. 2. **Detection Effect**: - In command-dependent ASR systems, by adjusting the noise intensity, the attack success rate of adversarial samples can be significantly reduced without affecting the recognition accuracy of normal samples. Experimental results show that when the noise intensity is above 100, the average attack success rate (ASR avg) of adversarial samples drops below 10%. ### Conclusion The method proposed in the paper effectively disrupts and detects adversarial samples by adding random noise, enhancing the security and robustness of ASR systems. Experimental results indicate that this method performs well in different types of ASR systems.