Abstract:The advantage and limitations of utilizing automatic speech recognition (ASR) techniques for modelling human speech recognition are investigated for a set of ``critical'' speech maskers for which many standard models of human speech recognition fail. A deep neural net (DNN)-based ASR system utilizing a closed-set sentence recognition test is used to model the speech recognition threshold (SRT) of normal-hearing listeners for a variety of noise types. The benchmark data from Schubotz et al. (2016) include SRTs measured in conditions with an increasing complexity in terms of spectro-temporal modulation (from stationary speech-shaped noise to a single interfering talker). The DNN-based model as proposed in Spille et al. (2018) produces a higher prediction accuracy than baseline models (i.e., SII, ESII, STOI, and mr-sESPM) even though it does not require a clean speech reference signal (as is the case for most auditory model-based SRT predictions). The most accurate predictions are obtained with multi-condition training with known noise types and ASR features that explicitly account for temporal modulations in noisy sentences. Another advantage of the approach is that the DNN can serve as valuable analysis tool to uncover signal recognition strategies: For instance, by identifying the most relevant cues for correct classification in modulated noise, it is shown that the DNN is listening in the dips. Finally, we present preliminary data indicating that the WER of the model can be replaced with an estimate of the WER, which does not require the transcript of utterances during test time and therefore eliminates an important limitation of the previous model that prevents it from being used in real-world scenarios.

Non-intrusive speech intelligibility prediction using automatic speech recognition derived measures

Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models

Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

Prediction of speech intelligibility with DNN-based performance measures

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

Objective Measures for Predicting Speech Intelligibility in Noisy Conditions Based on New Band-Importance Functions

Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations

SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

A Data-Driven Non-Intrusive Measure of Speech Quality and Intelligibility

Title Non-intrusive intelligibility prediction for Mandarin speech innoise

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Non-intrusive intelligibility prediction for Mandarin speech in noise

Non-intrusive speech quality assessment using neural networks

Monaural Speech Enhancement using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Modelling human speech recognition in challenging noise maskers using machine learning

HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network

Nonintrusive objective measurement of speech intelligibility: A review of methodology

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model