An optimized attention based hybrid deep learning framework for automatic speaker identification from speech signals
Venkata Subba Reddy Gade,M. Sumathi
DOI: https://doi.org/10.1007/s11042-024-19996-x
IF: 2.577
2024-08-26
Multimedia Tools and Applications
Abstract:Speaker recognition (SR) is the identification of speakers using the characteristics of their voice notes, and it has been researched extensively for many years. Technology advancements have made SR a more popular research topic in recent years. Deep learning (DL)-based SR works are the most advanced and effective among the extensive SR works documented in the literature, leading to higher accuracy. Nevertheless, to determine practical significance, one must closely investigate the effects of noise in the input signal and the neglect of important details during the learning phase. This work presents a novel automated DL-based hybrid framework for the accurate identification of male voice speakers. Pre-processing, feature extraction, feature selection, and SR are some of the stages that are applied to the voice notes that were taken out of the input dataset. First, the noise and interference from the audio samples are eliminated using a two-stage Savitzky Golay filtering technique (2S-SGF). Many significant features are extracted from the input signal following denoising in order to supply the recognition model with information. From the extracted features, a Chaotic Honey Badger Optimization Algorithm (ChHBOA) is used to select the most informative features. The densenet121_self-attention deep convolutional neural network (D121_SAttnDCNN) model receives these chosen features and uses them to perform SR. The proposed network model includes a self-attention layer to focus on highly informative features. Lastly, comprehensive evaluations are carried out through model simulation on the Python platform. A variety of experiments are used to demonstrate the performance significance of the model, which is assessed using the Voxceleb-1 gender dataset, which is made available to the public. The proposed SR model secured an overall accuracy of 98% and can be applied in different fields of voice-based authentication practices, including forensic management, security purposes, personal smart devices, remote payment, etc.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering