AdvTTS: Adversarial Text-to-Speech Synthesis Attack on Speaker Identification Systems.

Chu-Xiao Zuo,Zhi-Jun Jia,Wu-Jun Li
DOI: https://doi.org/10.1109/ICASSP48485.2024.10447190
2024-01-01
Abstract:Speaker identification (SI) systems have been widely employed in real-world applications. However, recent research has demonstrated that SI systems are vulnerable to two prevalent attacks even without providing feedback to the attacker: the transfer-based adversarial attack and the speech synthesis spoofing attack. The transfer-based adversarial attack faces the challenges of collecting natural speech for specific content and timbre. In contrast, the speech synthesis spoofing attack can synthesize speech for any content and timbre but can be detected by audio deepfake detectors (ADD). In this paper, we propose a novel method, called adversarial text-to-speech synthesis (AdvTTS), for attacking SI systems. AdvTTS combines the strengths of transfer-based adversarial attacks and speech synthesis spoofing attacks, by synthesizing transferable attack speech with local surrogate models. AdvTTS is the first attack method that can conduct both adversarial and spoofing attacks with any speech content and timbre. AdvTTS can deceive SI systems with high-quality speech while evading ADD detection. Experiments show that AdvTTS can outperform other baselines for spoofing attacks, and can outperform the baselines for adversarial attacks with the combination of projected gradient descent (PGD).
What problem does this paper attempt to address?