Abstract:Speaker identification is the method of human voice identifying with the help of artificial intelligence (AI) method. The technology of speaker identification is broadly utilized in voice recognition, secure, surveillance, electronic voice eavesdropping, and the verification of identity. In the existing methods, it does not provide the sufficient accuracy and robustness of the speech signal. To overcome these issues, an efficient Speaker Identification framework based on Mask region based convolutional neural network (Mask R-CNN) classifier parameter optimized using Hosted Cuckoo Optimization (HCO) is proposed in this manuscript. The objective of the proposed method is "to increase the accuracy and to improve the robustness of the signal". Initially, the input speech signals are taken from the real time dataset. From the input speech signal, there are four types of the features are extracted, they are Mel Frequency Differential Power Cepstral Coefficients (MFDPCC), Gamma tone Frequency Cepstral Coefficients (GFCC), Power Normalized Cepstral Coefficients (PNCC) and Spectral entropy for improving the robustness of the signal. Then, the speaker ID is classified by using the Mask R-CNN classifier. Similarly, the Mask R-CNN classifier parameters are optimized by using the HCO algorithm. This method is relevant in the real time application, such as telephone banking and the fax mailing. The simulation is executed in MATLAB. The simulation results shows that the proposed Mask-R-CNN-HCO method attains accuracy of 24.16%, 32.18%, 28.43%, 36.4%, 33.26%, Sensitivity of 37.68%, 33.80%, 24.16%, 32.18%, 28.43%, Precision of 35.88%, 24.16%, 32.18%, 28.43%, 26.77% higher than the existing methods, such as Automatic Classification of speaker identification using K-Nearest Neighbors algorithm (KNN), classification of speaker identification using multiclass support vector machine(MCSVM), classification of speaker identification using Gaussian Mixture Model–Convolutional Neural Network (GMMCNN) classifier, classification of speaker identification using Deep neural network (DNN) and classification of speaker identification using Gaussian Mixture Model–deep Neural Network (GMMDNN) classifier.

Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health

Explore Training of Deep Convolutional Neural Networks on Battery-powered Mobile Devices: Design and Application

Close the Gap Between Deep Learning and Mobile Intelligence by Incorporating Training in the Loop

Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking

An Energy-Efficient Binarized Neural Network Using Analog-Intensive Feature Extraction for Keyword and Speaker Verification Wakeup.

Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition

A Binaural Deep Neural Networks Parameter Mask for the Robust Automatic Speech Recognition System

Vision-referential speech enhancement of an audio signal using mask information captured as visual data

Wearing a MASK: Compressed Representations of Variable-Length Sequences Using Recurrent Neural Tangent Kernels

Optimization of DNN-based speaker verification model through efficient quantization technique

LightCAM: A Fast and Light Implementation of Context-Aware Masking based D-TDNN for Speaker Verification

An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO)

AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices

AM-MobileNet1D: A Portable Model for Speaker Recognition

Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

Efficient High-Performance Bark-Scale Neural Network for Residual Echo and Noise Suppression

Exploratory Evaluation of Speech Content Masking

More is Less: Domain-Specific Speech Recognition Microprocessor Using One-Dimensional Convolutional Recurrent Neural Network

DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor with On-Chip Self-Learning.

Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise