Secure speaker identification in open and closed environments modeled with symmetric comb filters
Amira Shafik,Mohamad Monir,Walid El-Shafai,Ashraf A. M. Khalaf,M. M. Nassar,Adel S. El-Fishawy,M. A. Zein El-Din,Moawad I. Dessouky,El-Sayed M. El-Rabaie,Fathi E. Abd El-Samie
DOI: https://doi.org/10.1007/s11042-023-16463-x
IF: 2.577
2024-07-14
Multimedia Tools and Applications
Abstract:Speech is a fundamental means of human interaction. Speaker Identification (SI) plays a crucial role in various applications, such as authentication systems, forensic investigation, and personal voice assistance. However, achieving robust and secure SI in both open and closed environments remains challenging. To address this issue, researchers have explored new techniques that enable computers to better understand and interact with humans. Smart systems leverage Artificial Neural Networks (ANNs) to mimic the human brain in identifying speakers. However, speech signals often suffer from interference, leading to signal degradation. The performance of a Speaker Identification System (SIS) is influenced by various environmental factors, such as noise and reverberation in open and closed environments, respectively. This research paper is concerned with the investigation of SI using Mel-Frequency Cepstral Coefficients (MFCCs) and polynomial coefficients, with an ANN serving as the classifier. To tackle the challenges posed by environmental interference, we propose a novel approach that depends on symmetric comb filters for modeling. In closed environments, we study the effect of reverberation on speech signals, as it occurs due to multiple reflections. To address this issue, we model the reverberation effect with comb filters. We explore different domains, including time, Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Discrete Sine Transform (DST) domains for feature extraction to determine the best combination for SI in case of reverberation environments. Simulation results reveal that DWT outperforms other transforms, leading to a recognition rate of 93.75% at a Signal-to-Noise Ratio (SNR) of 15 dB. Additionally, we investigate the concept of cancelable SI to ensure user privacy, while maintaining high recognition rates. Our simulation results show a recognition rate of 97.5% at 0 dB using features extracted from speech signals and their DCTs. For open environments, we implement a robust Automatic Speaker Identification (ASI) system that is capable of handling noise and interference. In this system, we apply Discrete Transforms (DTs) like DCT, DST, and DWT on degraded speech signals to extract robust features. The proposed system incorporates enhancement techniques, such as Spectral Subtraction (SS), Wiener Filtering (WF), Adaptive Wiener Filtering (AWF), and wavelet de-noising, to improve its performance and accuracy of SI. The results demonstrate the effectiveness of the proposed SIS, even under challenging conditions like low SNR and significant music interference. Leveraging features extracted from signals and their DWTs proves to be highly beneficial, achieving a recognition rate of 97.5% at 15 dB. Furthermore, wavelet de-noising contributes significantly to eliminating noise, while preserving the essential signals, resulting in improved performance. Additionally, we conduct a thorough investigation of the system sensitivity to telephone channel degradations, as well as the impact of interference and noise. By employing DWT and innovative modeling techniques, our research contributes to advancing robust SISs, which can be involved in promising applications in various domains such as security, personal assistance, and forensics.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering