Defend Data Poisoning Attacks on Voice Authentication

Ke Li,Cameron Baird,Dan Lin

DOI: https://doi.org/10.1109/TDSC.2023.3289446

2023-07-08

Abstract:With the advances in deep learning, speaker verification has achieved very high accuracy and is gaining popularity as a type of biometric authentication option in many scenes of our daily life, especially the growing market of web services. Compared to traditional passwords, "vocal passwords" are much more convenient as they relieve people from memorizing different passwords. However, new machine learning attacks are putting these voice authentication systems at risk. Without a strong security guarantee, attackers could access legitimate users' web accounts by fooling the deep neural network (DNN) based voice recognition models. In this paper, we demonstrate an easy-to-implement data poisoning attack to the voice authentication system, which can hardly be captured by existing defense mechanisms. Thus, we propose a more robust defense method, called Guardian, which is a convolutional neural network-based discriminator. The Guardian discriminator integrates a series of novel techniques including bias reduction, input augmentation, and ensemble learning. Our approach is able to distinguish about 95% of attacked accounts from normal accounts, which is much more effective than existing approaches with only 60% accuracy.

Cryptography and Security,Artificial Intelligence,Machine Learning,Sound,Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is data poisoning attacks in voice authentication systems. Specifically, the paper shows how to carry out an easily implemented data poisoning attack by injecting or replacing users' audio files during the user registration or account update stage. This attack can cause the voice authentication system to misrecognize the attacker's voice as legitimate, enabling the attacker to log in to the victim's network account using their own voice, and the victim may not realize this until the damage has occurred. To counter this attack, the paper proposes a defense method named Guardian, which is a discriminator based on convolutional neural network (CNN). Guardian integrates a series of new technologies such as bias reduction, input augmentation and ensemble learning, and can distinguish attacked accounts from normal accounts with an accuracy of over 95%, which is much higher than the 60% accuracy of existing methods. This method is universal for any voice authentication model based on deep neural network (DNN), and in the experiment, the popular Deep Speaker model is selected as the attack target because this model has a very high voice recognition accuracy (95%).

Defend Data Poisoning Attacks on Voice Authentication

Securing Voice Authentication Applications Against Targeted Data Poisoning

Fast and Lightweight Voice Replay Attack Detection Via Time-frequency Spectrum Difference

Adversarial Attack and Defense on Deep Neural Network-Based Voice Processing Systems: An Overview

Voice Presentation Attack Detection Using Convolutional Neural Networks

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

Defending Adversarial Attacks on Cloud-aided Automatic Speech Recognition Systems.

Stop Deceiving! an Effective Defense Scheme Against Voice Impersonation Attacks on Smart Devices

Defending Against Adversarial Attacks in Speaker Verification Systems

Voice Spoofing Countermeasure for Voice Replay Attacks Using Deep Learning

Data Poisoning and Backdoor Attacks on Audio Intelligence Systems

WaveFuzz: A Clean-Label Poisoning Attack to Protect Your Voice

Securing Voice Biometrics: One-Shot Learning Approach for Audio Deepfake Detection

Backdoor Defence for Voice Print Recognition Model Based on Speech Enhancement and Weight Pruning

Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion

You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones

A Universal Identity Backdoor Attack against Speaker Verification based on Siamese Network

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning.

Voice Adversarial Sample Generation Method for Ultrasonicization of Motion Noise

One-class Learning Towards Synthetic Voice Spoofing Detection

Accuth: Anti-Spoofing Voice Authentication Via Accelerometer.