Abstract:Voice spoofing attempts to break into a specific automatic speaker verification (ASV) system by forging the user's voice and can be used through methods such as text-to-speech (TTS), voice conversion (VC), and replay attacks. Recently, deep learning-based voice spoofing countermeasures have been developed. However, the problem with replay is that it is difficult to construct a large number of datasets because it requires a physical recording process. To overcome these problems, this study proposes a pre-training framework based on multi-order acoustic simulation for replay voice spoofing detection. Multi-order acoustic simulation utilizes existing clean signal and room impulse response (RIR) datasets to generate audios, which simulate the various acoustic configurations of the original and replayed audios. The acoustic configuration refers to factors such as the microphone type, reverberation, time delay, and noise that may occur between a speaker and microphone during the recording process. We assume that a deep learning model trained on an audio that simulates the various acoustic configurations of the original and replayed audios can classify the acoustic configurations of the original and replay audios well. To validate this, we performed pre-training to classify the audio generated by the multi-order acoustic simulation into three classes: clean signal, audio simulating the acoustic configuration of the original audio, and audio simulating the acoustic configuration of the replay audio. We also set the weights of the pre-training model to the initial weights of the replay voice spoofing detection model using the existing replay voice spoofing dataset and then performed fine-tuning. To validate the effectiveness of the proposed method, we evaluated the performance of the conventional method without pre-training and proposed method using an objective metric, i.e., the accuracy and F1-score. As a result, the conventional method achieved an accuracy of 92.94%, F1-score of 86.92% and the proposed method achieved an accuracy of 98.16%, F1-score of 95.08%.

A One-class Model for Voice Replay Attack Detection

Fast and Lightweight Voice Replay Attack Detection Via Time-frequency Spectrum Difference

Anti-Replay: A Fast and Lightweight Voice Replay Attack Detection System.

Cross-database replay detection in terminal-dependent speaker verification

A Real-Time Detection Approach Against Video-Replay Attack in Face Recognition

One-class Learning Towards Synthetic Voice Spoofing Detection

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification.

Siamese Network with Wav2vec Feature for Spoofing Speech Detection

Voice Spoofing Countermeasure for Voice Replay Attacks Using Deep Learning

An Experimental Study on Replay Attack Detection Using Spoofing Clues from both Voiced and Non-Voiced Segments

A Defense Method Based on a Novel Replay Attack

Replay detection using CQT-based modified group delay feature and ResNeWt network in ASVspoof 2019

Audio compression-assisted feature extraction for voice replay attack detection

Towards Vulnerability Analysis of Voice-Driven Interfaces and Countermeasures for Replay

Voice Presentation Attack Detection Using Convolutional Neural Networks

A Pre-Training Framework Based on Multi-Order Acoustic Simulation for Replay Voice Spoofing Detection

Replay attack detection using variable-frequency resolution phase and magnitude features

Stop Deceiving! an Effective Defense Scheme Against Voice Impersonation Attacks on Smart Devices

When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition

Voice spoofing detection with raw waveform based on Dual Path Res2net

An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification