Abstract:Noise robust speech recognition has become an important area of research in recent years. The fact that human listeners can recognize speech in the presence of strong noise inspires researchers to imitate some aspects of human auditory perception in automatic speech recognition. This has led to sub-band based speech recognition in which the full-band speech is split into several sub-bands and where each sub-band is processed separately. The resulting multi-band features can be combined in various ways for carrying out speech recognition task. Reported results have shown the superiority of this technique for speech recognition in strong noise conditions. In this paper, we will briefly review the multi-band feature extraction. We will then propose a block discrete cosine transform (BDCT) with its kernel transformation matrix being derived from the decomposition of the kernel of the discrete cosine transform (DCT). We show that the BDCT approximates the DCT in keeping information in decorrelating a sequence. When the BDCT is applied to the mel frequency filter bank energies (FBEs) to replace the DCT to convert them to cepstral coefficients, a new kind of MFCCs is yielded. We call these new features Block discrete cosine transform based MFCCs (BMFCCs) and show that a sub-band processing idea is implicit in the BMFCCs since the BDCT automatically divides the mel frequency FBEs into two sub-bands. We will report various speech recognition results using the BMFCCs as well as the comparison with the multi-band MFCCs and full- band MFCCs to elaborate the properties of the BMFCCs.

Robust MFCCs Derived from Differentiated Power Spectrum

Cepstrum derived from differentiated power spectrum for robust speech recognition

Modified MFCCs for Robust Speaker Recognition

Comparison of Different Implementations of MFCC

Using Mel-Frequency Cepstral Coefficients in Missing Data Technique

Integrating the energy information into MFCC

Robust Speech Recognition by Selecting Mel-Filter Banks

Improvement of MFCC parameters extraction in speaker recognition

Modified Mel Filter Bank to Compute MFCC of Subsampled Speech

Recognition of noisy speech using dynamic spectral subband centroids

High frequency weighted MFCC extraction for noise robust speaker verification

Speech recognition using Hilbert-Huang transform based features

Perturbation analysis of mel-frequency cepstrum coefficients

A New Speech Feature Extracted by Wavelet Analysis & Mel-Frequancy Filtering

On the Importance of Components of the MFCC in Speech and Speaker Recognition.

Design and implementation of speech recognition algorithm based on frequency range

Noise-robust speech recognition based on difference of power spectrum

Robust Noisy Speech Recognition with Adaptive Frequency Bank Selection

Relative Mel-Frequency Cepstral Coefficients Compensation for Robust Telephone Speech Recognition.

A Block Cosine Transform and Its Application in Speech Recognition

Robustness of speech recognition using combination features approach