Improving Replay Detection System with Channel Consistency DenseNeXt for the ASVspoof 2019 Challenge.
Chao Zhang,Junjie Cheng,Yanmei Gu,Huacan Wang,Jun Ma,Shaojun Wang,Jing Xiao
DOI: https://doi.org/10.21437/interspeech.2020-1044
2020-01-01
Abstract:In this paper we describe a novel replay detection system for the ASVspoof 2019 challenge. The objective of this challenge is to distinguish arbitrarily audio files from bona fide or spoofing attacks, where spoofing attacking includes replay attacks, text-tospeech and voice conversions. Our replay detection system is a pipeline system with three aspects: feature engineering, DNN models, and score fusion. Firstly, logspec is extracted as input features according to previous research works where spectrum augmentation is applied during training stage to boost performance under limited training data. Secondly, DNN models part includes three major models: SEnet, DenseNet, and our proposed model, channel consistency DenseNeXt, where binary cross entropy loss and center loss are applied as training objectives. Finally, score fusion is applied to all three DNN models in order to obtain primary system results. The experiment results show that for our best single system, channel consistency DenseNeXt, t-DCF and EER are 0.0137 and 0.46% on physical access evaluation set respectively. The performance of primary system obtains 0.00785 and 0.282% in terms of t-DCF and EER respectively. This is a 96.8% improvement compared to the baseline system CQCC-GMM and it achieves state-ofthe-art performance in PA challenge.