Abstract:Traditionally, speech quality evaluation relies on subjective assessments or intrusive methods that require reference signals or additional equipment. However, over recent years, non-intrusive speech quality assessment has emerged as a promising alternative, capturing much attention from researchers and industry professionals. This article presents a deep learning-based method that exploits large-scale intrusive simulated data to improve the accuracy and generalization of non-intrusive methods. The major contributions of this article are as follows. First, it presents a data simulation method, which generates degraded speech signals and labels their speech quality with the perceptual objective listening quality assessment (POLQA). The generated data is proven to be useful for pretraining the deep learning models. Second, it proposes to apply an adversarial speaker classifier to reduce the impact of speaker-dependent information on speech quality evaluation. Third, an autoencoder-based deep learning scheme is proposed following the principle of representation learning and adversarial training (AT) methods, which is able to transfer the knowledge learned from a large amount of simulated speech data labeled by POLQA. With the help of discriminative representations extracted from the autoencoder, the prediction model can be trained well on a relatively small amount of speech data labeled through subjective listening tests. Fourth, an end-to-end speech quality evaluation neural network is developed, which takes magnitude and phase spectral features as its inputs. This phase-aware model is more accurate than the model using only the magnitude spectral features. A large number of experiments are carried out with three datasets: one simulated with labels obtained using POLQA and two recorded with labels obtained using subjective listening tests. The results show that the presented phase-aware method improves the performance of the baseline model and the proposed model with latent representations extracted from the adversarial autoencoder (AAE) outperforms the state-of-the-art objective quality assessment methods, reducing the root mean square error (RMSE) by 10.5% and 12.2% on the Beijing Institute of Technology (BIT) dataset and Tencent Corpus, respectively. The code and supplementary materials are available at https://github.com/liushenme/AAE-SQA.

A non-invasive speech quality evaluation algorithm for hearing aids with multi-head self-attention and audiogram-based features

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App.

Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network

A Supervised Speech Enhancement Method for Smartphone-Based Binaural Hearing Aids

A Speech Enhancement Method Combining Beamforming with RNN for Hearing Aids.

Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

A Low-Latency Hybrid Multi-Channel Speech Enhancement System for Hearing Aids

Evaluation of Frequency-Lowering Algorithms for Intelligibility of Chinese Speech in Hearing-Aid Users

A Deep Learning-Based Time-Domain Approach for Non-Intrusive Speech Quality Assessment.

Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

Restoring speech intelligibility for hearing aid users with deep learning

Real-Time Implementation of an Efficient Speech Enhancement Algorithm for Digital Hearing Aids

An objective evaluation of Hearing Aids and DNN-based speech enhancement in complex acoustic scenes

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

DHASP: Differentiable Hearing Aid Speech Processing

Using Speech Foundational Models in Loss Functions for Hearing Aid Speech Enhancement

An Improved Frequency-Shift Compression Method Based on Auto Energy Gain Compensation for Digital Hearing Aids

Spectral-change Enhancement with Prior SNR for the Hearing Impaired