Abstract:The performance of speech recognition systems relies on the consistency and adaptation of the speech feature in complex conditions during the training and testing stages. Traditional systems usually perform poorly under adverse noisy conditions and are not applicable to most real world problems. In this paper, we investigate the speech feature extraction problem in a noisy environment and propose a novel approach based on Gabor filtering and tensor factorization. Recent physiological and psychoacoustic experimental results suggest that the localized spectro-temporal features are essential for auditory perception. To explore this property, we represent the speech signal by using a general higher order tensor and employ two-dimensional Gabor functions with different scales and directions to analyze the localized patches of the power spectrogram. Then the Nonnegative Tensor PCA with sparse constraints is proposed to learn the projection matrices from multiple interrelated feature subspaces. The objective of the sparse constraints is to preserve the statistical characteristic of clean speech data by finding projection matrices of speech subspaces and reduce the noise components which have distributions different from those of clean speech. A multifactor analysis method is proposed to extract robust sparse features by processing the data samples in tensor structure. The simulation results indicate that our proposed method is able to improve the speech recognition performance, especially in noisy environments, compared with the traditional speech feature extraction methods.

Feature Extraction Based on Wavelet Packet-LPCCin Speaker Recognition

Research on Speaker-Depended Isolated-Word Speech Recognition System

A New Speech Feature Extracted by Wavelet Analysis & Mel-Frequancy Filtering

The Car Plate Chinese Character Feature Extraction Based on Wavelet

Auditory model-based speech feature extraction and its application to speaker identification

Application of a New Mixed Feature in Speaker Identification

Improvement on Automatic Speech Segmentation Using Wavelet Packet Transform Features

Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models

Stationary wavelet Filtering Cepstral coefficients (SWFCC) for robust speaker identification

Speech Feature Extraction Based on Linear Prediction Residual

Speaker Recognition Using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring.

Speaker Identification based on LSP and Gaussian Mixture Model

A Study of Feature Parameters Based on LPC Analysis with Applications to Speaker Identification

Speech Feature Parameter Extraction Based on HHT and Its Application in Speaker Recognition

Robust Multifactor Speech Feature Extraction Based on Gabor Analysis

Robust speech feature extraction based on Gabor filtering and tensor factorization

A New Feature In Speech Recognition Based On Wavelet Transform

Improving Short-Duration Speaker Recognition by Joint Bark-Wavelet Acoustic Feature Coupling and Triplet Dual-Attention Mechanism Network

Features Extraction for Lhasa Tibetan Speech Recognition

Study of the Acoustic Features in Speaker Recognition

Auditory Model Based Speech Feature Extraction and Its Application to Speaker Identification