Exploring Generation of Pronunciation Lexicon for Low-Resource Language Automatic Speech Recognition Based on Generic Phone Recognizer

Jinpeng Li,Xie Chen,Weiqiang Zhang
DOI: https://doi.org/10.1007/s12204-024-2730-3
2024-04-24
Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University
Abstract:The lexicon is an essential component in the hybrid automatic speech recognition (ASR) system. However, a high-quality lexicon requires significant efforts from the linguistic experts and is difficult to obtain, especially for low-resource languages. This paper addresses the problem of using a well-trained universal phone recognizer, obtained through the training of multilingual speech data and pronunciation lexicons, to generate pronunciation lexicons for low-resource languages driven by speech data. We propose a simple pipeline that utilizes this approach to generate pronunciation lexicons and apply them into ASR systems. The steps to generate the lexicon are simple and generic: apply the International Phonetic Alphabet (IPA) phone recognizer on the speech, then align it with the reference word sequence, followed by filtering to obtain a series of AUTO-subwords, using them to generate the AUTO-subword lexicon and the AUTO-IPA lexicon. We used the pronunciation lexicon generated for the hybrid system and for fine-tuning the pre-trained model. According to the experiment results, we are able to construct the lexicon without resourcing to linguistic experts. Furthermore, the generated lexicon is able to outperform grapheme-based lexicon and is comparable to expert lexicon.
What problem does this paper attempt to address?