Abstract:Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user's speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four–channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models' informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test.

Predicting Viseme Parameters from Speech Based on Neural Network

Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM.

Silent Speech Decoding Using Spectrogram Features Based on Neuromuscular Activities

Convolutional Neural Network applied in mime speech recognition using sEMG data

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

VisemeNet: Audio-Driven Animator-Centric Speech Animation

Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses

Decoding of the speech envelope from EEG using the VLAAI deep neural network

Low Level Descriptors Based DBLSTM Bottleneck Feature for Speech Driven Talking Avatar

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

Improved Posterior Probability Estimation Methods for the Freely-Spoken Speech Evaluation

Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks

Extracting Spatial Muscle Activation Patterns in Facial and Neck Muscles for Silent Speech Recognition Using High-Density sEMG

Speech Driven MPEG-4 Based Face Animation via Neural Network

Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

An Effective Conversion of Visemes to Words for High-Performance Automatic Lipreading

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

On the Relationship between Face Movements, Tongue Movements, and Speech Acoustics

Correlation Between Audio–visual Enhancement of Speech in Different Noise Environments and SNR: A Combined Behavioral and Electrophysiological Study

Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck