An Attention Long Short-Term Memory based system for automatic classification of speech intelligibility

Miguel Fernández-Díaz,Ascensión Gallardo-Antolín
DOI: https://doi.org/10.1016/j.engappai.2020.103976
2024-02-05
Abstract:Speech intelligibility can be degraded due to multiple factors, such as noisy environments, technical difficulties or biological conditions. This work is focused on the development of an automatic non-intrusive system for predicting the speech intelligibility level in this latter case. The main contribution of our research on this topic is the use of Long Short-Term Memory (LSTM) networks with log-mel spectrograms as input features for this purpose. In addition, this LSTM-based system is further enhanced by the incorporation of a simple attention mechanism that is able to determine the more relevant frames to this task. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. Results show that the attention LSTM architecture outperforms both, a reference Support Vector Machine (SVM)-based system with hand-crafted features and a LSTM-based system with Mean-Pooling.
Audio and Speech Processing,Machine Learning,Sound
What problem does this paper attempt to address?