Abstract:Speaker identification is the method of human voice identifying with the help of artificial intelligence (AI) method. The technology of speaker identification is broadly utilized in voice recognition, secure, surveillance, electronic voice eavesdropping, and the verification of identity. In the existing methods, it does not provide the sufficient accuracy and robustness of the speech signal. To overcome these issues, an efficient Speaker Identification framework based on Mask region based convolutional neural network (Mask R-CNN) classifier parameter optimized using Hosted Cuckoo Optimization (HCO) is proposed in this manuscript. The objective of the proposed method is "to increase the accuracy and to improve the robustness of the signal". Initially, the input speech signals are taken from the real time dataset. From the input speech signal, there are four types of the features are extracted, they are Mel Frequency Differential Power Cepstral Coefficients (MFDPCC), Gamma tone Frequency Cepstral Coefficients (GFCC), Power Normalized Cepstral Coefficients (PNCC) and Spectral entropy for improving the robustness of the signal. Then, the speaker ID is classified by using the Mask R-CNN classifier. Similarly, the Mask R-CNN classifier parameters are optimized by using the HCO algorithm. This method is relevant in the real time application, such as telephone banking and the fax mailing. The simulation is executed in MATLAB. The simulation results shows that the proposed Mask-R-CNN-HCO method attains accuracy of 24.16%, 32.18%, 28.43%, 36.4%, 33.26%, Sensitivity of 37.68%, 33.80%, 24.16%, 32.18%, 28.43%, Precision of 35.88%, 24.16%, 32.18%, 28.43%, 26.77% higher than the existing methods, such as Automatic Classification of speaker identification using K-Nearest Neighbors algorithm (KNN), classification of speaker identification using multiclass support vector machine(MCSVM), classification of speaker identification using Gaussian Mixture Model–Convolutional Neural Network (GMMCNN) classifier, classification of speaker identification using Deep neural network (DNN) and classification of speaker identification using Gaussian Mixture Model–deep Neural Network (GMMDNN) classifier.

Speaker Identification Using MFCC Feature Extraction ANN Classification Technique

Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique

Multimedia application for forensic automatic speaker recognition from disguised voices using MFCC feature extraction and classification techniques

Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning

ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score

Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models

Speaker Recognition Using DMFCC over Telephone Channels

Development of High Accuracy Classifier for the Speaker Recognition System

Speaker Identification using MFCC-Domain Support Vector Machine

Speaker Verification Using Simple Temporal Features and Pitch Synchronous Cepstral Coefficients

Multi-resolution Time Frequency Feature and Complementary Combination for Short Utterance Speaker Recognition

Enhancing speaker identification through reverberation modeling and cancelable techniques using ANNs

An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO)

Speaker identification and localization using shuffled MFCC features and deep learning

DNN-HMM based Speaker Adaptive Emotion Recognition using Proposed Epoch and MFCC Features

Auditory Model Based Speech Feature Extraction and Its Application to Speaker Identification

Analysis of influencing features with spectral feature extraction and multi-class classification using deep neural network for speech recognition system

Data-Driven Decision-Support System for Speaker Identification Using E-Vector System

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

The predictive differential amplitude spectrum for robust speaker recognition in stationary noises