Integrating the energy information into MFCC

Fang Zheng,Guoliang Zhang
DOI: https://doi.org/10.21437/icslp.2000-96
2000-01-01
Abstract:The Mel-Frequency Cepstrum Coefficients (MFCC) is a widely used set of feature used in automatic speech recognition systems introduced in 1980 by Davis and Mermelstein (2). In this traditional implementation, the 0 th coefficient is excluded for the reason it is somewhat unreliable. In this paper, we analyze this term and find that it can be regarded as the generalized frequency band energy (FBE) and is hence useful, resulting in the FBE-MFCC. We also propose a better analysis, called the auto-regressive analysis, on the frame energy, which performs better than its 1 st and/or 2nd order differential derivatives. Experiments show that, the FBE-MFCC and the frame energy with their corresponding auto-regressive analysis coefficients form the better combination reducing the syllable error rate (SER) by 10.0% across a giant speech database, compared to the traditional MFCC with its corresponding auto- regressive analysis coefficients.
What problem does this paper attempt to address?