Speaker recognition using Improved Butterfly Optimization Algorithm with hybrid Long Short Term Memory network
Venkata Subba Reddy Gade,Sumathi Manickam
DOI: https://doi.org/10.1007/s11042-024-18298-6
IF: 2.577
2024-02-14
Multimedia Tools and Applications
Abstract:Speaker recognition is extensively applied in several applications, namely identity verification, electronic voice eavesdropping, surveillance, voice recognition, etc. In an effective speaker recognition system, the extraction and selection of salient and discriminative features is an essential process for accurately identifying the speakers. Therefore, a novel hybrid framework is introduced in this research manuscript. Initially, the input data are acquired from the three-benchmark databases: THUYG-20 SRE corpus, Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and LibriSpeech. Further, the emotional features are extracted by utilizing hybrid feature extraction techniques which are, amplitude, zero cross rate, energy, Root Mean Square (RMS), statistical moments, autocorrelation, and Mel-Frequency Cepstral Coefficients (MFCC). Then, the feature optimization is carried out using Improved Butterfly Optimization Algorithm (IBOA) that decreases the computational time and complexity of the recognition model. At last, a hybrid classifier: Convolutional Neural Network (CNN) with Long Short Term Memory (LSTM) is implemented for speaker recognition, and its performance is analyzed in terms of F1-score, specificity, accuracy, Positive Predictive Value (PPV), and sensitivity. The empirical investigation demonstrated that the IBOA-based hybrid LSTM network achieved 92.65%, 96.97% and 96.98% of recognition accuracy on the LibriSpeech, RAVDESS and THUYG-20 SRE corpus databases. These results are more impressive than the comparative models, Deep Neural Network (DNN), random forest, K-Nearest Neighbor (KNN), LSTM, Multi class Support Vector Machine (MSVM), Deep Convolutional Recurrent Neural Network (DCRNN), Golden Ratio aided Neural Network (GRaNN), deep sequential LSTM, and Probabilistic Neural Network (PNN).
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering