Defend Data Poisoning Attacks on Voice Authentication

Ke Li,Cameron Baird,Dan Lin
DOI: https://doi.org/10.1109/TDSC.2023.3289446
2023-07-08
Abstract:With the advances in deep learning, speaker verification has achieved very high accuracy and is gaining popularity as a type of biometric authentication option in many scenes of our daily life, especially the growing market of web services. Compared to traditional passwords, "vocal passwords" are much more convenient as they relieve people from memorizing different passwords. However, new machine learning attacks are putting these voice authentication systems at risk. Without a strong security guarantee, attackers could access legitimate users' web accounts by fooling the deep neural network (DNN) based voice recognition models. In this paper, we demonstrate an easy-to-implement data poisoning attack to the voice authentication system, which can hardly be captured by existing defense mechanisms. Thus, we propose a more robust defense method, called Guardian, which is a convolutional neural network-based discriminator. The Guardian discriminator integrates a series of novel techniques including bias reduction, input augmentation, and ensemble learning. Our approach is able to distinguish about 95% of attacked accounts from normal accounts, which is much more effective than existing approaches with only 60% accuracy.
Cryptography and Security,Artificial Intelligence,Machine Learning,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is data poisoning attacks in voice authentication systems. Specifically, the paper shows how to carry out an easily implemented data poisoning attack by injecting or replacing users' audio files during the user registration or account update stage. This attack can cause the voice authentication system to misrecognize the attacker's voice as legitimate, enabling the attacker to log in to the victim's network account using their own voice, and the victim may not realize this until the damage has occurred. To counter this attack, the paper proposes a defense method named Guardian, which is a discriminator based on convolutional neural network (CNN). Guardian integrates a series of new technologies such as bias reduction, input augmentation and ensemble learning, and can distinguish attacked accounts from normal accounts with an accuracy of over 95%, which is much higher than the 60% accuracy of existing methods. This method is universal for any voice authentication model based on deep neural network (DNN), and in the experiment, the popular Deep Speaker model is selected as the attack target because this model has a very high voice recognition accuracy (95%).