Abstract:In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.

Combining Mfcc And Pitch To Enhance The Performance Of The Gender Recognition

Speaker gender recognition based on combining the contribution of MFCC and pitch features

Simplified Deformation Compensation for Emotional Speaker Recognition

Mandarin Isolated Words Recognition Method Based on Pitch Contour

Speaker Recognition By Combining Mfcc And Phase Information In Noisy Conditions

Speech Gender Recognition Based on Gauss Mixture Model

Using Cepstral and Prosodic Features for Chinese Accent Identification

Pitch Synchronized Relative Phase with Peak Error Detection For Noise-robust Speaker Recognition

Multi-resolution Time Frequency Feature and Complementary Combination for Short Utterance Speaker Recognition

Combination of Pitch Synchronous Analysis and Fisher Criterion for Speaker Identification

On the use of phase information-based joint factor analysis for speaker verification under channel mismatch condition

Single-channel speech separation integrating pitch information based on a multi task learning framework

Speaker Recognition Using DMFCC over Telephone Channels

Adaptive Gaussian Mixture Model and Its Application in Speaker Recognition

Gender Identification using MFCC for Telephone Applications - A Comparative Study

Gender-dependent Feature Extraction for Speaker Recognition

Using MCE Algorithm to Improve the Performance of Speaker Recognition

Mandarin accent identification based on GMM with multi-feature fusion

Robust Gmm Based Gender Classification Using Pitch and Rasta-Plp Parameters of Speech

Design and Implementation of a Real-Time Speaker Identification System with Improved GMM

A Speaker Verification Method Based on MFCC and Prosodic Features