Robust Speaker Recognition Using Improved GFCC and Adaptive Feature Selection
Xingyu Zhang,Xia Zou,Meng Sun,Penglong Wu
DOI: https://doi.org/10.1007/978-3-030-16946-6_13
2020-01-01
Abstract:Speaker recognition systems have shown good performance in noise-free environments, but the performance will severely deteriorate in the presence of noises. At the front end of the systems, Mel-Frequency Cepstral Coefficient (MFCC), or a relatively noise-robust feature Gammatone Frequency Cepstral Coefficients (GFCC), is commonly used as time-frequency feature. To further improve the noise-robustness of GFCC, signal processing techniques, such as DC removal, pre-emphasis and Cepstral Mean Variance Normalization (CMVN), are investigated in the extraction of GFCC. Being aware the advantages and disadvantages of MFCC and GFCC, an adaptive strategy was proposed to make feature selection based on the quality of speech. Experiments were conducted on TIMIT dataset to evaluate our approach. Compared with ordinary GFCC and MFCC features, our method significantly reduced the EER in speech data with miscellaneous SNRs.