Abstract:Speaker recognition (SR) is the identification of speakers using the characteristics of their voice notes, and it has been researched extensively for many years. Technology advancements have made SR a more popular research topic in recent years. Deep learning (DL)-based SR works are the most advanced and effective among the extensive SR works documented in the literature, leading to higher accuracy. Nevertheless, to determine practical significance, one must closely investigate the effects of noise in the input signal and the neglect of important details during the learning phase. This work presents a novel automated DL-based hybrid framework for the accurate identification of male voice speakers. Pre-processing, feature extraction, feature selection, and SR are some of the stages that are applied to the voice notes that were taken out of the input dataset. First, the noise and interference from the audio samples are eliminated using a two-stage Savitzky Golay filtering technique (2S-SGF). Many significant features are extracted from the input signal following denoising in order to supply the recognition model with information. From the extracted features, a Chaotic Honey Badger Optimization Algorithm (ChHBOA) is used to select the most informative features. The densenet121_self-attention deep convolutional neural network (D121_SAttnDCNN) model receives these chosen features and uses them to perform SR. The proposed network model includes a self-attention layer to focus on highly informative features. Lastly, comprehensive evaluations are carried out through model simulation on the Python platform. A variety of experiments are used to demonstrate the performance significance of the model, which is assessed using the Voxceleb-1 gender dataset, which is made available to the public. The proposed SR model secured an overall accuracy of 98% and can be applied in different fields of voice-based authentication practices, including forensic management, security purposes, personal smart devices, remote payment, etc.

A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

Text-independent voiceprint recognition via compact embedding of dilated deep convolutional neural networks

A stacked convolutional neural network framework with multi-scale attention mechanism for text-independent voiceprint recognition

Modified layer deep convolution neural network for text-independent speaker recognition

Self-attention Based Speaker Recognition Using Cluster-Range Loss

An Interpretable and Generalizable Speech Detector Based on a CNN-LSTM Framework

Voice Presentation Attack Detection Using Convolutional Neural Networks

End-to-End Feature Learning for Text-Independent Speaker Verification

Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition

RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification

CACRN-Net: A 3D log Mel spectrogram based channel attention convolutional recurrent neural network for few-shot speaker identification

Non-local convolutional neural networks (nlcnn) for speaker recognition

Self-consistent context aware conformer transducer for speech recognition

High-Level CNN and Machine Learning Methods for Speaker Recognition

An optimized attention based hybrid deep learning framework for automatic speaker identification from speech signals

Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network

Deep Learning Algorithms based Voiceprint Recognition System in Noisy Environment

Research on End-to-end Voiceprint Recognition Model Based on Convolutional Neural Network

PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement

Residual Convolutional CTC Networks for Automatic Speech Recognition.

Deep Speaker Feature Learning for Text-independent Speaker Verification