Partially Fake Audio Detection Based on MOSNet with Pretraining Models

Hanyue Liu,Jianqian Zhang,Jing Wang,Miao Liu,Liang Xu,Yi Sun
DOI: https://doi.org/10.1109/ACAIT60137.2023.10528491
2023-01-01
Abstract:With the rapid development of speech synthesis and voice conversion related technologies, many potential risks have been brought to people’s information security and privacy. Therefore, it is important to build techniques to identify manipulated regions in audios. In this paper, we propose a novel partially fake audio detection system based on MOSNet, a speech quality assessment network, and pretraining models. Comparisions between features extracted by pretraining models and Mel-spectrogram are made. Experimental results show that the proposed system combining MOSNet and XLS-R-300m pretraining model has the best performance on both evaluation set and test set, and has good generalization ability. The final score of the proposed system on test set is 5.97% higher than that of the baseline system based on RawNet.
What problem does this paper attempt to address?