Abstract:Speaker Verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients (MFCC), their first and second derivatives (Deltas and Delta- Deltas), Bark Frequency Cepstral Coefficients (BFCC), Perceptual Linear Predictive (PLP), and Relative Spectral Transform - Perceptual Linear Predictive (RASTA-PLP). In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional Support Vector Machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: Linear kernel and Gaussian Radial Basis Function (RBF) kernel with a Logistic Regression (LR) classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.

Improving Performance of Speaker Identification System Using Complementary Information Fusion

Hybrid Silent Speech Interface Through Fusion of Electroencephalography and Electromyography

Spoken Language Identification Using Hybrid Feature Extraction Methods

Multi-feature Combination for Speaker Recognition

Fusion of deep shallow features and models for speaker recognition

Multi-resolution Time Frequency Feature and Complementary Combination for Short Utterance Speaker Recognition

Enhancement of a Text-Independent Speaker Verification System by using Feature Combination and Parallel-Structure Classifiers

ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score

A Fishervoice Based Feature Fusion Method for Short Utterance Speaker Recognition

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

Speaker Discrimination on Broadcast News and Telephonic Calls Using a Fusion of Neural and Statistical Classifiers

Speaker Identification using MFCC-Domain Support Vector Machine

Audio-Visual Speaker Verification via Joint Cross-Attention

System Combination for Short Utterance Speaker Recognition.

Multi-level Fusion of Audio and Visual Features for Speaker Identification

Improving Speaker Verification Performance Against Long-Term Speaker Variability

Score Fusion For Perceptual Evaluation Of Pronunciation Quality

Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning

Speaker Verification Using Simple Temporal Features and Pitch Synchronous Cepstral Coefficients

Single-channel speech enhancement by using psychoacoustical model inspired fusion framework