Abstract:Speaker identification is the method of human voice identifying with the help of artificial intelligence (AI) method. The technology of speaker identification is broadly utilized in voice recognition, secure, surveillance, electronic voice eavesdropping, and the verification of identity. In the existing methods, it does not provide the sufficient accuracy and robustness of the speech signal. To overcome these issues, an efficient Speaker Identification framework based on Mask region based convolutional neural network (Mask R-CNN) classifier parameter optimized using Hosted Cuckoo Optimization (HCO) is proposed in this manuscript. The objective of the proposed method is "to increase the accuracy and to improve the robustness of the signal". Initially, the input speech signals are taken from the real time dataset. From the input speech signal, there are four types of the features are extracted, they are Mel Frequency Differential Power Cepstral Coefficients (MFDPCC), Gamma tone Frequency Cepstral Coefficients (GFCC), Power Normalized Cepstral Coefficients (PNCC) and Spectral entropy for improving the robustness of the signal. Then, the speaker ID is classified by using the Mask R-CNN classifier. Similarly, the Mask R-CNN classifier parameters are optimized by using the HCO algorithm. This method is relevant in the real time application, such as telephone banking and the fax mailing. The simulation is executed in MATLAB. The simulation results shows that the proposed Mask-R-CNN-HCO method attains accuracy of 24.16%, 32.18%, 28.43%, 36.4%, 33.26%, Sensitivity of 37.68%, 33.80%, 24.16%, 32.18%, 28.43%, Precision of 35.88%, 24.16%, 32.18%, 28.43%, 26.77% higher than the existing methods, such as Automatic Classification of speaker identification using K-Nearest Neighbors algorithm (KNN), classification of speaker identification using multiclass support vector machine(MCSVM), classification of speaker identification using Gaussian Mixture Model–Convolutional Neural Network (GMMCNN) classifier, classification of speaker identification using Deep neural network (DNN) and classification of speaker identification using Gaussian Mixture Model–deep Neural Network (GMMDNN) classifier.

A stacked convolutional neural network framework with multi-scale attention mechanism for text-independent voiceprint recognition

A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

Text-independent voiceprint recognition via compact embedding of dilated deep convolutional neural networks

Modified layer deep convolution neural network for text-independent speaker recognition

Self-attention Based Speaker Recognition Using Cluster-Range Loss

Voice Presentation Attack Detection Using Convolutional Neural Networks

MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

CACRN-Net: A 3D log Mel spectrogram based channel attention convolutional recurrent neural network for few-shot speaker identification

CNN with Phonetic Attention for Text-Independent Speaker Verification.

An optimized attention based hybrid deep learning framework for automatic speaker identification from speech signals

TMS: Temporal multi-scale in time-delay neural network for speaker verification

Self-Attention Networks for Text-Independent Speaker Verification

End-to-End Attention based Text-Dependent Speaker Verification

RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification

Bidirectional Attention For Text-Dependent Speaker Verification

An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO)

Speaker verification using attentive multi-scale convolutional recurrent network

Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning

Research on Voiceprint Recognition Technology Based on Deep Neural Network

Look, Listen and Learn - A Multimodal LSTM for Speaker Identification