Abstract:Speech is a fundamental means of human interaction. Speaker Identification (SI) plays a crucial role in various applications, such as authentication systems, forensic investigation, and personal voice assistance. However, achieving robust and secure SI in both open and closed environments remains challenging. To address this issue, researchers have explored new techniques that enable computers to better understand and interact with humans. Smart systems leverage Artificial Neural Networks (ANNs) to mimic the human brain in identifying speakers. However, speech signals often suffer from interference, leading to signal degradation. The performance of a Speaker Identification System (SIS) is influenced by various environmental factors, such as noise and reverberation in open and closed environments, respectively. This research paper is concerned with the investigation of SI using Mel-Frequency Cepstral Coefficients (MFCCs) and polynomial coefficients, with an ANN serving as the classifier. To tackle the challenges posed by environmental interference, we propose a novel approach that depends on symmetric comb filters for modeling. In closed environments, we study the effect of reverberation on speech signals, as it occurs due to multiple reflections. To address this issue, we model the reverberation effect with comb filters. We explore different domains, including time, Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Discrete Sine Transform (DST) domains for feature extraction to determine the best combination for SI in case of reverberation environments. Simulation results reveal that DWT outperforms other transforms, leading to a recognition rate of 93.75% at a Signal-to-Noise Ratio (SNR) of 15 dB. Additionally, we investigate the concept of cancelable SI to ensure user privacy, while maintaining high recognition rates. Our simulation results show a recognition rate of 97.5% at 0 dB using features extracted from speech signals and their DCTs. For open environments, we implement a robust Automatic Speaker Identification (ASI) system that is capable of handling noise and interference. In this system, we apply Discrete Transforms (DTs) like DCT, DST, and DWT on degraded speech signals to extract robust features. The proposed system incorporates enhancement techniques, such as Spectral Subtraction (SS), Wiener Filtering (WF), Adaptive Wiener Filtering (AWF), and wavelet de-noising, to improve its performance and accuracy of SI. The results demonstrate the effectiveness of the proposed SIS, even under challenging conditions like low SNR and significant music interference. Leveraging features extracted from signals and their DWTs proves to be highly beneficial, achieving a recognition rate of 97.5% at 15 dB. Furthermore, wavelet de-noising contributes significantly to eliminating noise, while preserving the essential signals, resulting in improved performance. Additionally, we conduct a thorough investigation of the system sensitivity to telephone channel degradations, as well as the impact of interference and noise. By employing DWT and innovative modeling techniques, our research contributes to advancing robust SISs, which can be involved in promising applications in various domains such as security, personal assistance, and forensics.

Stationary wavelet Filtering Cepstral coefficients (SWFCC) for robust speaker identification

Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models

Bionic Cepstral coefficients (BCC): A new auditory feature extraction to noise-robust speaker identification

Research on Speaker-Depended Isolated-Word Speech Recognition System

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Auditory model-based speech feature extraction and its application to speaker identification

Secure speaker identification in open and closed environments modeled with symmetric comb filters

Feature Extraction Based on Wavelet Packet-LPCCin Speaker Recognition

A Study of Acoustic Features in Arabic Speaker Identification under Noisy Environmental Conditions

Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum

Robust Feature Extraction Using Temporal Context Averaging for Speaker Identification in Diverse Acoustic Environments

Modified MFCCs for Robust Speaker Recognition

Development of High Accuracy Classifier for the Speaker Recognition System

Text-independent Speaker Identification Based on Spectral Weighting Functions

Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Blind speech separation based on undecimated wavelet packet-perceptual filterbanks and independent component analysis

Improving Short-Duration Speaker Recognition by Joint Bark-Wavelet Acoustic Feature Coupling and Triplet Dual-Attention Mechanism Network

Detection of Doctored Speech: Towards an End-to-End Parametric Learn-able Filter Approach

Speaker Identification using MFCC-Domain Support Vector Machine

Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification