Speaker Recognition Method Based on Statistical Features of Spectrograms and CNN

Xi Chen,Yonghui Wang,Lianming Wang,Jieqiong Yu
DOI: https://doi.org/10.1145/3331453.3361316
2019-01-01
Abstract:This study proposes a technology for obtaining stable pronunciation features for speaker recognition using a statistical strategy that utilizes the physiological characteristics of human pronunciation and bionic cognitive processes of a speaker's voice. Due to its characteristics of reflecting the frequency information in a speaker's voice, a spectrogram is employed to analyze the samples. An individual's voice signal is first split into short-duration voice segments and a logarithmic spectrogram is then generated for each of the segments. To collect the energy information for each logarithmic spectrogram at a particular frequency, the linear superposition method is employed to obtain an energy distribution map for the different frequency components in the speaker's voice. Next, a certain number of the spectrograms are superimposed as a group to reduce the sample size and to accurately obtain stable pronunciation features. Finally, features of the spectrograms are used to train a convolutional neural network for speaker recognition. Experimental results show that the proposed method exhibits high stability and recognition rates.
What problem does this paper attempt to address?