Abstract:Recent breakthroughs in deep learning applied to voice processing have sparked an upsurge in security and privacy concerns. This paper presents an inventive adversarial sample generation technique termed the "Ultrasonic Attack," crafted to covertly steer downstream voice-related tasks – encompassing voice emotion classification, voice synthesis, and voice recognition processes such as biometric authentication used in sporting events. This technique is distinctive in its employment of a multi-feature fitting strategy that allows for precise targeting and alteration of key voice attributes critical for downstream tasks. Ingeniously integrating ultrasonic noise into the original vocal recordings, our method can mislead sophisticated deep learning systems while remaining undetectable to the human ear, leading to erroneous outcomes in voice-based applications. The implications of this are particularly profound in high-stakes situations like athlete identity verification or voice command integrity in sports technology systems. Rigorous experimental validations underscore the "Ultrasonic Attack" as a potent method. When juxtaposed with leading-edge adversarial sample generation techniques, our approach stands out, delivering unrivaled performance in tasks as varied as Voice Emotion Classification and Speaker Identification. Our method triumphs in creating adversarial samples that not only carry out successful attacks with enhanced efficacy but also conserve the natural features of the voice, underscoring the critical need for fortified security in speech-processing technologies.

Crafting Adversarial Examples For Computational Paralinguistic Applications

Understanding and Benchmarking the Commonality of Adversarial Examples

Echo: Reverberation-based Fast Black-Box Adversarial Attacks on Intelligent Audio Systems.

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

Towards Resistant Audio Adversarial Examples

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples

Weighted-Sampling Audio Adversarial Example Attack.

Adversarial Privacy Protection on Speech Enhancement

Adversarial Examples Attack and Countermeasure for Speech Recognition System: A Survey.

Query-Efficient Adversarial Attack with Low Perturbation Against End-to-End Speech Recognition Systems

Defending Adversarial Attacks on Cloud-aided Automatic Speech Recognition Systems.

Defending and Detecting Audio Adversarial Example Using Frame Offsets.

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Adversarial Examples for Automatic Speech Recognition: Attacks and Countermeasures

FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation

Robust Audio Adversarial Example for a Physical Attack

Imperceptible Black-Box Waveform-Level Adversarial Attack Towards Automatic Speaker Recognition

Voice Adversarial Sample Generation Method for Ultrasonicization of Motion Noise

Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise

Adversarial Attack and Defense on Deep Neural Network-Based Voice Processing Systems: An Overview