Massimiliano Todisco,Michele Panariello,Xin Wang,Héctor Delgado,Kong Aik Lee,Nicholas Evans
Abstract:We present Malacopula, a neural-based generalised Hammerstein model designed to introduce adversarial perturbations to spoofed speech utterances so that they better deceive automatic speaker verification (ASV) systems. Using non-linear processes to modify speech utterances, Malacopula enhances the effectiveness of spoofing attacks. The model comprises parallel branches of polynomial functions followed by linear time-invariant filters. The adversarial optimisation procedure acts to minimise the cosine distance between speaker embeddings extracted from spoofed and bona fide utterances. Experiments, performed using three recent ASV systems and the ASVspoof 2019 dataset, show that Malacopula increases vulnerabilities by a substantial margin. However, speech quality is reduced and attacks can be detected effectively under controlled conditions. The findings emphasise the need to identify new vulnerabilities and design defences to protect ASV systems from adversarial attacks in the wild.
What problem does this paper attempt to address?
The paper attempts to address the issue of improving the vulnerability of Automatic Speaker Verification (ASV) systems to spoofing attacks and deepfake attacks. Specifically, the paper introduces a new model called Malacopula, which is based on a neural network generalized Hammerstein model. By introducing adversarial perturbations, it enhances spoofing speech samples to better deceive ASV systems.
### Main Issues:
1. **Vulnerability of ASV Systems**: Although the performance of ASV systems has significantly improved in recent years, they are still susceptible to spoofing attacks. These attacks can be implemented through text-to-speech synthesis and voice conversion technologies, generating spoofing speech samples that are often difficult to distinguish from real speech, thereby reducing the reliability of ASV systems.
2. **Adversarial Attacks**: In addition to traditional spoofing attacks, ASV systems also face a new threat—adversarial attacks. These attacks introduce small, sometimes imperceptible noise, causing ASV systems to misjudge and increase the false acceptance rate.
### Solution:
- **Malacopula Model**: The paper proposes an adversarial attack method based on a neural network generalized Hammerstein model. This model modifies the speech signal through nonlinear processing to enhance the effect of spoofing attacks. Specifically, the Malacopula model includes multiple parallel polynomial function branches followed by a linear time-invariant filter. By optimizing these filters, the cosine distance between spoofing speech samples and real speech samples can be minimized, thereby increasing the success rate of spoofing attacks.
### Experimental Results:
- **Experimental Setup**: The paper conducted experiments using three different ASV systems (CAM++, ECAPA, ERes2Net) and the ASVspoof 2019 dataset to verify the effectiveness of the Malacopula model.
- **Experimental Results**: The experimental results show that Malacopula significantly increases the vulnerability of ASV systems to spoofing and deepfake attacks. However, this also leads to a decline in speech quality, and under controlled conditions, these attacks can be effectively detected by existing spoofing detection systems (such as AASIST).
### Conclusion:
- **Importance**: Although Malacopula performs well in enhancing the effect of spoofing attacks, the perturbations it introduces also reduce speech quality and can be detected under controlled conditions. This emphasizes the need to continuously identify new vulnerabilities and design defense measures to ensure the robustness and reliability of ASV systems in practical applications.
Overall, the paper reveals the vulnerability of ASV systems to adversarial attacks by introducing the Malacopula model and emphasizes the importance of continuous improvement and adaptation to cope with the evolving adversarial techniques.