CBC-Based Synthetic Speech Detection

Jichen Yang,Qianhua He,Yongjian Hu,Weiqiang Pan
DOI: https://doi.org/10.4018/ijdcf.2019040105
2019-01-01
International Journal of Digital Crime and Forensics
Abstract:In previous studies of synthetic speech detection (SSD), the most widely used features are based on a linear power spectrum. Different from conventional methods, this article proposes a new feature extraction method for SSD from octave power spectrum which is obtained from constant-Q transform (CQT). By combining CQT, block transform (BT) and discrete cosine transform (DCT), a new feature is obtained, namely, constant-Q block coefficients (CBC). In which, CQT is used to transform speech from the time domain into the frequency domain, BT is used to segment octave power spectrum into many blocks and DCT is used to extract principal information of every block. The experimental results on ASVspoof 2015 corpus shows that CBC is superior to other front-ends features that have been benchmarked on ASVspoof 2015 evaluation set in terms of equal error rate (EER).
What problem does this paper attempt to address?