Abstract:The relationships between muscle movements and neural signals make it possible to decode silent speech based on neuromuscular activities. The decoding can be formulated as a supervised classification task. The electromyography (EMG) captured from surface articulatory muscles contains useful information that can help assist in decoding of speech. Spectrograms obtained from EMG have a wealth of information relating to the decoding, but have not yet been fully explored. In addition, the decoding results are often uncertain. Therefore, it is important to quantify the prediction confidence. This paper aims to improve the decoding performance by representing time series signals as spectrograms and utilising Inductive Conformal Prediction (ICP) to provide predictions with confidence. All EMG data are recorded on six dedicated facial muscles while participants recite the displayed words subvocally. Three pre-trained convolutional models of MobileNet-V1, ResNet18 and Xception are used to extract features from spectrograms for classification. Both bidirectional Long-Short Time Memory (Bi-LSTM) and Gate Recurrent Unit (GRU) classifiers are used for prediction. Furthermore, an ICP decoder based on Bi-LSTM is built to provide guaranteed predictions for each example at a specified confidence level. The proposed method of combining feature extraction based on Xception and classification using Bi-LSTM gives a higher accuracy of 0.87 than other methods. ICP outputs confidence measurements for each example that can help users to evaluate the reliability of new predictions. Experimental results demonstrate the practical usefulness in decoding articulatory neuromuscular activity and the advantages in applying ICP.

Speaker Recognition Method Based on Statistical Features of Spectrograms and CNN

Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM.

A Speaker Recognition Method Based on Stable Learning.

An Interpretable and Generalizable Speech Detector Based on a CNN-LSTM Framework

Speaker Recognition Based on Long Short-Term Memory Networks

Speaker-Independent Speech Emotion Recognition Based On Cnn-Blstm And Multiple Svms

Self-attention Based Speaker Recognition Using Cluster-Range Loss

Auditory Model Based Speech Feature Extraction and Its Application to Speaker Identification

A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

A stacked convolutional neural network framework with multi-scale attention mechanism for text-independent voiceprint recognition

GMM and CNN Hybrid Method for Short Utterance Speaker Recognition

Speaker Recognition Using Wavelet Cepstral Coefficient, I-Vector, and Cosine Distance Scoring and Its Application for Forensics.

Speaker Recognition Technology Based on Lip Movement

Robust Speaker Identification Using An Auditory-Based Feature

High-Level CNN and Machine Learning Methods for Speaker Recognition

Auditory model-based speech feature extraction and its application to speaker identification

Voiceprint recognition system based on auditory characteristics

Look, Listen and Learn - A Multimodal LSTM for Speaker Identification

3D Convolutional Neural Networks Based Speaker Identification and Authentication.

Speaker Recognition on Mobile Phone: Using Wavelet, Cepstral Coefficients and Probabilisitc Neural Network

CNN with Phonetic Attention for Text-Independent Speaker Verification.