Abstract:A human speaker recognition expert often observes the speech spectrogram in multiple different scales for speaker recognition, especially under the short utterance condition. Inspired by this action, this paper proposes a novel multi-resolution time frequency feature (MRTF) extraction method, which is obtained by performing a 2-Dimensional discrete cosine transform (DCT) in multi-scale on the time frequency spectrogram matrix and then selecting and combining to the final multi-scaled transformed elements. Compared to the traditional Mel-Frequency Cepstral Coefficient (MFCC) feature extraction, the proposed method can make better use of multi-resolution temporal-frequency information. Beyond this, we also proposed three complementary combination strategies of MFCC and MRTF: in feature level, in i-vector level and in score level. Comparing their performance. We found the best results are obtained by combination in i-vector level. In the three NIST 2008 Speaker Recognition Evaluation datasets, the proposed method is the most effective for improving the performance under short utterance than under long utterance. And after the combination, we can achieve an EER of 11.32 % and MinDCF of 0.054 in the 10sec-10sec trials on the male dataset, which is an absolute 3 % improvement of EER than the best reported result in this field.

A Robust Speech Recognition Based on the Feature of Weighting Combination ZCPA

Weighting Observation Vectors for Robust Speech Recognition in Noisy Environments.

A new weighted feature approach based on GA for speech recognition

A New Feature In Speech Recognition Based On Wavelet Transform

Auditory model-based speech feature extraction and its application to speaker identification

Noise Robust Speech Recognition Using Multi-Channel Based Channel Selection And ChannelWeighting.

A Robust Speech Feature - Perceptive Scalogram Based on Wavelet Analysis

Robust MMSE-FW-LAASR Scheme at Low SNRs

Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition

Noise Robust Speaker Recognition Based on Adaptive Frame Weighting in GMM for i-Vector Extraction.

Wavoice: A mmWave-assisted Noise-resistant Speech Recognition SystemJust Accepted

Accent Recognition with Hybrid Phonetic Features

An Efficient Robust Asr System Based On The Combination Of Speech Enhancement And Hmm Adaptation

Weighted mel-cepstrum for speech analysis

Multi-feature Combination for Speaker Recognition

Improved speech recognition algorithm based on MFCC feature

Combining Speech Enhancement and Discriminative Feature Extraction for Robust Speaker Recognition

Auditory Model Based Speech Feature Extraction and Its Application to Speaker Identification

Multi-resolution Time Frequency Feature and Complementary Combination for Short Utterance Speaker Recognition

Robust Speech Recognition Method Based on Discriminative Learning of Environmental Features

Robust Front-End for Speech Recognition Based on Computational Auditory Scene Analysis and Speaker Model