Listen, Know and Spell: Knowledge-Infused Subword Modeling for Improving ASR Performance of OOV Named Entities

Nilaksh Das,Duen Horng Chau,Monica Sunkara,Sravan Bodapati,Dhanush Bekal,Katrin Kirchhoff
DOI: https://doi.org/10.1109/icassp43922.2022.9746748
2022-05-23
Abstract:Automatic speech recognition (ASR) is increasingly being used in specialized domains such as medical ASR and news transcription. Owing to the lack of high quality annotated speech data in such domains, off-the-shelf models are commonly employed by fine-tuning on domain-specific data. This poses a significant challenge in transcribing long-tail expressions and out-of-vocabulary (OOV) named entities. On the other hand, readily available knowledge graphs (KGs) provide semantically structured knowledge for such domain-specific named entities. In this work, we propose the Knowledge-Infused Subword Model (KISM), a novel technique for incorporating semantic context from KGs into the ASR pipeline for improving the performance of OOV named entities. Our experiments show that KISM improves OOV recall of an ASR model by 4.58% (absolute) for named entities that were not seen during training.
What problem does this paper attempt to address?