Abstract:Automatic voice authentication based on deep learning is a promising technology that has received much attention from academia and industry. It has proven to be effective in a variety of applications, including biometric access control systems. Using biometric data in such systems is difficult, particularly in a centralized setting. It introduces numerous risks, such as information disclosure, unreliability, security, privacy, etc. Voice authentication systems are becoming increasingly important in solving these issues. This is especially true if the device relies on voice commands from the user. This work investigates the development of a text-independent voice authentication system. The spatial features of the voiceprint (corresponding to the speech spectrum) are present in the speech signal as a result of the spectrogram, and the weighted wavelet packet cepstral coefficients (W-WPCC) are effective for spatial feature extraction (corresponding to the speech spectrum). W- WPCC characteristics are calculated by combining sub-band energies with sub-band spectral centroids using a weighting scheme to generate noise-resistant acoustic characteristics. In addition, this work proposes an enhanced inception v3 model for voice authentication. The proposed InceptionV3 system extracts feature from input data from the convolutional and pooling layers. By employing fewer parameters, this architecture reduces the complexity of the convolution process while increasing learning speed. Following model training, the enhanced Inception v3 model classifies audio samples as authenticated or not based on extracted features. Experiments were carried out on the speech of five English speakers whose voices were collected from YouTube. The results reveal that the suggested improved method, based on enhanced Inception v3 and trained on speech spectrogram pictures, outperforms the existing methods. The approach generates tests with an average categorization accuracy of 99%. Compared to the performance of these network models on the given dataset, the proposed enhanced Inception v3 network model achieves the best results regarding model training time, recognition accuracy, and stability.

Autonomous Framework For Person Identification By Analyzing Vocal Sounds And Speech Patterns

VocalLock

A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave.

An Efficient Framework of Human Voice Verification for Robotic Applications

Prediction based person recognition using face and speech (multi modal) for improved performance

Multimodal person authentication using speech, face and visual speech

A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream Using Machine Learning Classification Techniques

Speech Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential Audio Features.

An optimized attention based hybrid deep learning framework for automatic speaker identification from speech signals

Data-Driven Decision-Support System for Speaker Identification Using E-Vector System

Development of High Accuracy Classifier for the Speaker Recognition System

Multimedia application for forensic automatic speaker recognition from disguised voices using MFCC feature extraction and classification techniques

Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning

Gender Identification using MFCC for Telephone Applications - A Comparative Study

Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier

An Efficient Voice Authentication System using Enhanced Inceptionv3 Algorithm

Voiceprint recognition system based on auditory characteristics

Identification and Recognition of Speaker Voice Using a Neural Network-Based Algorithm

Robust Biometric Verification Using Phonocardiogram Fingerprinting and a Multilayer-Perceptron-Based Classifier

Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech

A Fusion of EMG and IMU for an Augmentative Speech Detection and Recognition System