Abstract:An auditory-based feature extraction algorithm is presented. We name the new features as cochlear filter cepstral coefficients (CFCCs) which are defined based on a recently developed auditory transform (AT) plus a set of modules to emulate the signal processing functions in the cochlea. The CFCC features are applied to a speaker identification task to address the acoustic mismatch problem between training and testing environments. Usually, the performance of acoustic models trained in clean speech drops significantly when tested in noisy speech. The CFCC features have shown strong robustness in this kind of situation. In our experiments, the CFCC features consistently perform better than the baseline MFCC features under all three mismatched testing conditions-white noise, car noise, and babble noise. For example, in clean conditions, both MFCC and CFCC features perform similarly, over 96%, but when the signal-to-noise ratio (SNR) of the input signal is 6 dB, the accuracy of the MFCC features drops to 41.2%, while the CFCC features still achieve an accuracy of 88.3%. The proposed CFCC features also compare favorably to perceptual linear predictive (PLP) and RASTA-PLP features. The CFCC features consistently perform much better than PLP. Under white noise, the CFCC features are significantly better than RASTA-PLP, while under car and babble noise, the CFCC features provide similar performances to RASTA-PLP.

Fractional Fourier Transform Based Auditory Feature for Language Identification

Forensic Speech Information Hiding Using Fractional Cosine-Cepstrum Transform

Auditory Features with Vocal Track Length Normalization for Language Identification

Time-Frequency Cepstral Features and Combining Discriminative Training for Phonotactic Language Recognition

Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

Detection of Left-Sided and Right-Sided Hearing Loss Via Fractional Fourier Transform.

Factor Analysis for Language Identification Based on Phoneme Recognition

An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification under Mismatched Conditions

Variant Time-Frequency Cepstral Features for Speaker Recognition

Auditory model-based speech feature extraction and its application to speaker identification

An Auditory Feature Extraction Method Based on Forward-Masking and Its Application in Robust Speaker Identification and Speech Recognition.

A High-Performance Auditory Feature for Robust Speech Recognition

Robust Speaker Identification Using An Auditory-Based Feature

Multi-resolution Time Frequency Feature and Complementary Combination for Short Utterance Speaker Recognition

Detection of fricative and vowels in speech signals

An Auditory System-Based Feature for Robust Speech Recognition.

Detection-based accented speech recognition using articulatory features.

A Forward Masking Auditory Model And Its Application In Speaker Identification And Speech Recognition

Design and implementation of speech recognition algorithm based on frequency range

High-resolution Acoustic Modeling and Compact Language Modeling of Language-Universal Speech Attributes for Spoken Language Identification.

Spoken Language Identification Using Hybrid Feature Extraction Methods