Robust Audio Anti-Spoofing System Based on Low-Frequency Sub-Band Information

Menglu Li,Xiao-Ping Zhang
DOI: https://doi.org/10.1109/waspaa58266.2023.10248132
2023-01-01
Abstract:The current audio anti-spoofing systems usually have a computationally complex architecture without providing the fundamental discriminative factors for the detection judgments. The state-of-the-arts also highly depend on voice information to develop detector systems, which may become vulnerable when the spoofing algorithms have further improved the quality of fake speech. Therefore, we conduct a series of experiments on different frequency sub-bands to investigate the underlying discriminative features. We find the lowest frequency sub-band in the range from 0 to 1600Hz contains the most critical features that distinguish between Deepfake and real speech. We also focus on forensic evidence and identify that the basis of detectors’ judgment exists in non-speech parts in audio samples. Based on the findings, our single detection system, with only 57K parameters and utilizing a one-tenth segment of the entire spectrogram as input, demonstrates its robustness by outperforming all official baselines of the ASVspoof2021 DF track. Our lightweight system can be easily applied in practical use cases, such as automated Deepfake screening or protecting voice-able devices.
What problem does this paper attempt to address?