Correlated Multi-Level Speech Enhancement for Robust Real-World ASR Applications Using Mask-Waveform-Feature Optimization
Hang Chen,Jun Du,Zhe Wang,Chenxi Wang,Yuling Ren,Qinglong Li,Ruibo Liu,Chin-Hui Lee
DOI: https://doi.org/10.1109/apsipaasc58517.2023.10317166
2023-01-01
Abstract:Our proposed correlated multi-level optimization approach enhances speech recognition performance for high-performance acoustic models in real-world applications. By combining mean squared error of mask, scale-invariant source-to-noise ratio, and cross-entropy loss functions, as well as adopting Pearson correlation coefficient as a part of the optimization goal to measure the correlation between them, our approach aims to not only reduce the value of each loss during training but also increase the correlation between them. Experimental results on continuous Mandarin recognition in mobile phone scenarios show that our approach achieves a relative reduction of about 25.29% in the average character error rate across five signal-to-noise ratio levels. Notably, our approach improves objective perception qualities and intelligibility measures, as well as recognition accuracies, surpassing some advanced speech enhancement techniques in the context of automatic speech recognition.