What problem does this paper attempt to address?

The problem that this paper attempts to solve is the security and robustness of deep neural network (DNN) - driven voice authentication systems in the face of targeted data - poisoning attacks. Specifically: 1. **Privacy and security risks**: Voice authentication systems rely on users' unique voiceprint features for identity verification, but the training data of these systems is vulnerable to malicious attackers' tampering. If an attacker replaces a legitimate user's voice sample with their own, the system may erroneously learn the attacker's features, leading to unauthorized access. 2. **Limitations of existing defense mechanisms**: - Existing defense mechanisms are insufficient in terms of efficiency and accuracy. - These methods are difficult to simulate real - world attack scenarios, so they are not effective in actual deployment. - For the situation where multiple attackers attempt to impersonate legitimate users and manipulate their profiles simultaneously, existing research lacks in - depth understanding. 3. **The impact of small - proportion data - poisoning**: Previous research has not fully explored the effectiveness of implementing covert - poisoning attacks by tampering with only a small part (for example, 0.1% to 10%) of user data. To address these problems, the author proposes an enhanced framework that combines convolutional neural network (CNN) and K - nearest neighbor (KNN) models to improve the defense ability of voice authentication systems against targeted data - poisoning attacks. This framework has been experimented on real - world datasets and has demonstrated its robustness and effectiveness in dealing with small - proportion data - poisoning attacks. ### Formula representation The formulas mentioned in the paper are as follows: - Assume that \( p_n(z_n|X_n) \) is the probability distribution of normal feature vectors given \( X_n \), where \( X_n \) is a subset of input audio of normal accounts. - \( p_a(z_a|X_a) \) is the probability distribution of attacked feature vectors given \( X_a \), where \( X_a \) is a subset of input audio of attacked accounts. The assumed expression is: \[ p_n(z_n|X_n)=p_a(z_a|X_a), \quad \text{but} \quad p(X_n)\neq p(X_a) \] This assumption indicates that although the normal and attacked feature vectors have the same conditional probability distribution, their prior distributions are different. This reflects that an attacker can change the decision boundary of the system by replacing a small amount of data, thereby achieving unauthorized access. ### Summary This paper aims to enhance the defense ability of voice authentication systems against targeted data - poisoning attacks by proposing a new framework, especially when only a small proportion of data is tampered with. The experimental results show that this framework is superior to existing methods in terms of accuracy and robustness and can effectively deal with attack scenarios in the real world.

Securing Voice Authentication Applications Against Targeted Data Poisoning

Defend Data Poisoning Attacks on Voice Authentication

UltraBD: Backdoor Attack against Automatic Speaker Verification Systems via Adversarial Ultrasound

Voice Presentation Attack Detection Using Convolutional Neural Networks

Securing Voice Biometrics: One-Shot Learning Approach for Audio Deepfake Detection

Adversarial Attack and Defense on Deep Neural Network-Based Voice Processing Systems: An Overview

Voice spoofing detection using a neural networks assembly considering spectrograms and mel frequency cepstral coefficients

WaveFuzz: A Clean-Label Poisoning Attack to Protect Your Voice

Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of Sound

Voice Spoofing Countermeasure for Voice Replay Attacks Using Deep Learning

On-Device Voice Authentication with Paralinguistic Privacy

Adversarial Transformation of Spoofing Attacks for Voice Biometrics

Data Poisoning and Backdoor Attacks on Audio Intelligence Systems

Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound

Parallel Stacked Aggregated Network for Voice Authentication in IoT-Enabled Smart Devices

"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World

Exploiting Physical Presence Sensing to Secure Voice Assistant Systems.

AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection

D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack

Practical Attacks on Voice Spoofing Countermeasures

Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion