Abstract:We present Malacopula, a neural-based generalised Hammerstein model designed to introduce adversarial perturbations to spoofed speech utterances so that they better deceive automatic speaker verification (ASV) systems. Using non-linear processes to modify speech utterances, Malacopula enhances the effectiveness of spoofing attacks. The model comprises parallel branches of polynomial functions followed by linear time-invariant filters. The adversarial optimisation procedure acts to minimise the cosine distance between speaker embeddings extracted from spoofed and bona fide utterances. Experiments, performed using three recent ASV systems and the ASVspoof 2019 dataset, show that Malacopula increases vulnerabilities by a substantial margin. However, speech quality is reduced and attacks can be detected effectively under controlled conditions. The findings emphasise the need to identify new vulnerabilities and design defences to protect ASV systems from adversarial attacks in the wild.

What problem does this paper attempt to address?

The paper attempts to address the issue of improving the vulnerability of Automatic Speaker Verification (ASV) systems to spoofing attacks and deepfake attacks. Specifically, the paper introduces a new model called Malacopula, which is based on a neural network generalized Hammerstein model. By introducing adversarial perturbations, it enhances spoofing speech samples to better deceive ASV systems. ### Main Issues: 1. **Vulnerability of ASV Systems**: Although the performance of ASV systems has significantly improved in recent years, they are still susceptible to spoofing attacks. These attacks can be implemented through text-to-speech synthesis and voice conversion technologies, generating spoofing speech samples that are often difficult to distinguish from real speech, thereby reducing the reliability of ASV systems. 2. **Adversarial Attacks**: In addition to traditional spoofing attacks, ASV systems also face a new threat—adversarial attacks. These attacks introduce small, sometimes imperceptible noise, causing ASV systems to misjudge and increase the false acceptance rate. ### Solution: - **Malacopula Model**: The paper proposes an adversarial attack method based on a neural network generalized Hammerstein model. This model modifies the speech signal through nonlinear processing to enhance the effect of spoofing attacks. Specifically, the Malacopula model includes multiple parallel polynomial function branches followed by a linear time-invariant filter. By optimizing these filters, the cosine distance between spoofing speech samples and real speech samples can be minimized, thereby increasing the success rate of spoofing attacks. ### Experimental Results: - **Experimental Setup**: The paper conducted experiments using three different ASV systems (CAM++, ECAPA, ERes2Net) and the ASVspoof 2019 dataset to verify the effectiveness of the Malacopula model. - **Experimental Results**: The experimental results show that Malacopula significantly increases the vulnerability of ASV systems to spoofing and deepfake attacks. However, this also leads to a decline in speech quality, and under controlled conditions, these attacks can be effectively detected by existing spoofing detection systems (such as AASIST). ### Conclusion: - **Importance**: Although Malacopula performs well in enhancing the effect of spoofing attacks, the perturbations it introduces also reduce speech quality and can be detected under controlled conditions. This emphasizes the need to continuously identify new vulnerabilities and design defense measures to ensure the robustness and reliability of ASV systems in practical applications. Overall, the paper reveals the vulnerability of ASV systems to adversarial attacks by introducing the Malacopula model and emphasizes the importance of continuous improvement and adaptation to cope with the evolving adversarial techniques.

Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems

Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

Voice Presentation Attack Detection Using Convolutional Neural Networks

Query-Efficient Adversarial Attack with Low Perturbation Against End-to-End Speech Recognition Systems

Small-footprint convolutional neural network for spoofing detection

Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems

Spoofing Speaker Verification With Voice Style Transfer And Reconstruction Loss

Audio Spoofing Verification using Deep Convolutional Neural Networks by Transfer Learning

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

Learning to Fool the Speaker Recognition

Defense Against Adversarial Attacks on Spoofing Countermeasures of ASV

Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification System

Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

Spoofing Detection Goes Noisy: An Analysis of Synthetic Speech Detection in the Presence of Additive Noise