Ghost-in-Wave: How Speaker-Irrelative Features Interfere DeepFake Voice Detectors

Xuan Hai,Xin Liu,Zhaorun Chen,Yuan Tan,Song Li,Weina Niu,Gang Liu,Rui Zhou,Qingguo Zhou
DOI: https://doi.org/10.1109/icme57554.2024.10688273
2024-01-01
Abstract:Recent speech synthesis technology can generate high-quality speech indistinguishable from human speech, thus introducing various security and privacy risks. Numerous recent studies have focused on fake voice detection to address these risks, with many claiming to achieve ideal performance. However, is this really the case? A recent research work introduced Speaker-Irrelative-Features (SiFs), unrelated to the information in speech files but capable of influencing fake detectors. This means that existing detectors may rely on SiFs to a certain extent to distinguish real and fake speech. In this paper, we introduce an evaluation framework to evaluate the influence of SiFs in existing fake voice detectors in depth. We evaluate three SiFs which include background noise, the mute parts before and after voice, and the sampling rate on ASVspoof2019 and FoR. Our results confirm the substantial influence of SiFs on fake voice detection performance, and we delve into the analysis of the underlying mechanisms.
What problem does this paper attempt to address?