Improving Limited Resource Speech Recognition Performance with Latent Regression Bayesian Network

Liang Xu,Yue Zhao,Xiaona Xu,Yigang Liu,Qiang Ji
DOI: https://doi.org/10.1007/978-3-031-44198-1_32
2023-01-01
Abstract:In limited resource speech recognition scenarios, the limited data may result in overfitting and decreased recognition rates when the traditional acoustic features are employed. To enhance speech recognition performance, it is essential to extract representative and robust features from speech signals. This paper explores the latent regression Bayesian network (LRBN) to derive more efficient speech representation from traditional acoustic features to train end-to-end speech recognition models. The LRBN is an effective generative model that captures the inherent dependencies from the original data. To evaluate the effectiveness of LRBN for speech representation, we compare traditional acoustic features and bottleneck features with the hidden features extracted by LRBN. Our experimental results demonstrate that LRBN improves the accuracy of speech recognition on five speech datasets.
What problem does this paper attempt to address?