Detection of Synthetic Speech Based on Spectrum Defects

Jiacheng Deng,Terui Mao,Diqun Yan,Li Dong,Mingyu Dong
DOI: https://doi.org/10.1145/3552466.3556529
2022-01-01
Abstract:Synthetic spoofing speech has become a threat to online communication and automatic speaker verification (ASV) systems based on deep learning since the synthetic model can produce anyone's voice. The first Audio Deep Synthesis Detection Challenge (ADD 2022) is launched to spur researchers around the world to build innovative new technologies that can further accelerate and foster research on detecting deep synthesis and manipulated speech. This paper presents a spoofing detection system submitted to ADD 2022 Track 3.2 Detection task (FG-D). The system consists of two parts to detect synthetic speech. First, Mel-frequency cepstral coefficients (MFCCs), Linear frequency cepstral coefficients (LFCCs), Delta coefficients, and Delta-Delta coefficients features derived from speech spectrogram are fed into DenseNet for building the DenseNet detection system (DDS). Then Mute segment classifier (MSC), High-frequency classifier (HFC), and Block spectrogram classifier (BSC) algorithms are designed for the defects of the synthetic speech on the spectrogram and the spectrum defect detection system SPECT is formed. The experimental results of the fusion system composed of SPECT and DDS in ADD FG-D demonstrate an EER of 8.5%, and our final submission ranks 6th in the evaluation phase of ADD FG-D.
What problem does this paper attempt to address?