A Combined Feature Approach for Speaker Segmentation Using Convolution Neural Network.

Jiang Zhong,Pan Zhang,Xue Li
DOI: https://doi.org/10.1007/978-3-319-77383-4_54
2018-01-01
Abstract:In this paper, a speaker segmentation algorithm is proposed based on a Combined feature approach using the Convolution Neural Network (CNN), which is used to deal with the speaker segmentation problem of dialogue speech with partial prior knowledge in the CALL_CENTER environment. For the first time, the Mel-Frequency Cepstral Coefficients (MFCC) feature and the SPECTROGRAM feature are combined as the input of CNN to train the speakers' voice feature model and to estimate the change point. In the experiments, a real database about the dialogue voice related to insurance sales and real estate sales industry is used to compare our proposed approach with Bayesian Information Criterion (BIC) approach using different acoustic features sets. The results show that the synthetical performance is improved, and our algorithm has a better segmentation.
What problem does this paper attempt to address?