Application of Hierarchical Clustering Analysis for Vocal Feature Extraction

Jiaqi Ai,Yi Zuo,Junxia Liu,Peichao He,Tieshan Li,C. L. Philip Chen
DOI: https://doi.org/10.1109/csde48274.2019.9162362
2019-01-01
Abstract:Since Mel-Frequency Cepstral Coefficient (MFCC) was firstly proposed in 1980s, it became the mostly-used speech feature in the field of automatic speech. MFCC employed Mel filters to transform the vocal frequency into Mel frequency, and considered a logarithmic relationship between practical frequency and Mel frequency. However, most of MFCCs used 26 blocks of uniform Fbanks without consideration of the dynamic characteristics of cepstrum coefficients. To address this issue, this paper proposes a hierarchical clustering approach to analyze the inverse discrete cosine transform cepstrum coefficient (IDCT CC), and uses cosine similarity to measure the distribution of blocks of the IDCT CC in the frequency domain. Our method extracts a 14-dimension vocal feature vector, and calls this feature as C-vector. In the experiment, we investigated the performance of C-vector for the speaker identification (SI) tasks, and employ Gaussian mixture model (GMM) to train SI models based on the C-vector. The results showed that the identification accuracy of C-vector was better than MFCC.
What problem does this paper attempt to address?