Pronunciation guided copy and correction model for ASR error correction
Wang, Wenjun,Yu, Zhengtao,Huang, Yuxin,Guo, Junjun,Zhou, Guojiang
DOI: https://doi.org/10.1007/s13042-024-02191-7
2024-06-11
International Journal of Machine Learning and Cybernetics
Abstract:Error correction has proven to be an effective means for refining mistakes produced by Automatic Speech Recognition (ASR) models, thereby contributing to a notable reduction in the Word Error Rate (WER) at the ASR post-edit stage. Existing ASR error correction methods built upon sequence-to-sequence architecture may be suffered from the over-correction issue, resulting in the introduction of new mistakes or alterations to correct portions. In this paper, we propose a Pronunciation Guided Copy and Correction (PGCC) model for ASR error correction. Leveraging the fact that ASR hypotheses share a big overlap with the correct text and are frequently characterized by homophone errors, our approach incorporates a copy module into the BART pre-trained model's encoder-decoder structure, this module optimally decides whether to retain a token from the source input (via copying) or generate a modified one through the decoder. Furthermore, a hierarchical phonetic feature encoder is designed to provide guidance to the copy module and BART decoder, implicitly identifying the positions of homophone errors and generating precise corrections. Experiments on two public datasets demonstrate the effectiveness of our proposed method, showcasing remarkable reductions of 18.18% and 44.84% in character error rate and outperforming solid baseline models.
computer science, artificial intelligence