Abstract:The performance of speech recognition systems relies on the consistency and adaptation of the speech feature in complex conditions during the training and testing stages. Traditional systems usually perform poorly under adverse noisy conditions and are not applicable to most real world problems. In this paper, we investigate the speech feature extraction problem in a noisy environment and propose a novel approach based on Gabor filtering and tensor factorization. Recent physiological and psychoacoustic experimental results suggest that the localized spectro-temporal features are essential for auditory perception. To explore this property, we represent the speech signal by using a general higher order tensor and employ two-dimensional Gabor functions with different scales and directions to analyze the localized patches of the power spectrogram. Then the Nonnegative Tensor PCA with sparse constraints is proposed to learn the projection matrices from multiple interrelated feature subspaces. The objective of the sparse constraints is to preserve the statistical characteristic of clean speech data by finding projection matrices of speech subspaces and reduce the noise components which have distributions different from those of clean speech. A multifactor analysis method is proposed to extract robust sparse features by processing the data samples in tensor structure. The simulation results indicate that our proposed method is able to improve the speech recognition performance, especially in noisy environments, compared with the traditional speech feature extraction methods.

Nonnegative Tensor PCA and Application to Speaker Recognition in Noise Environments

Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure

Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization

Maximum Likelihood I-Vector Space Using PCA for Speaker Verification.

Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization

Non-negative Tensor Factorization for Speech Enhancement

Robust speech feature extraction based on Gabor filtering and tensor factorization

Multifactor Sparse Feature Extraction Using Convolutive Nonnegative Tucker Decomposition

Robust Multifactor Speech Feature Extraction Based on Gabor Analysis

A Novel I-Vector Framework Using Multiple Features and PCA for Speaker Recognition in Short Speech Condition

Speech Enhancement with Nonnegative Dictionary Training and RPCA

Experimental evaluation of a new speaker identification framework using PCA.

TLS-NAP Algorithm for Text-Independent Speaker Recognition

Exploiting PCA classifiers to speaker recognition

An Auditory Neural Feature Extraction Method for Robust Speech Recognition.

Tensor Rpca By Bayesian Cp Factorization With Complex Noise

Improved multitaper PNCC feature for robust speaker verification

Sparse Nonnegative Matrix Factorization Strategy for Cochlear Implants

Group Sparse Features for Speech Emotion Perception in Tensor Space

Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network

The Effectiveness of ICA-based Representation: Application to Speech Feature Extraction for Noise Robust Speaker Recognition