Similar Word Model for Unfrequent Word Enhancement in Speech Recognition

Xi Ma,Dong Wang,Javier Tejedor
DOI: https://doi.org/10.1109/taslp.2016.2585863
2016-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:The popular n-gram language model (LM) is weak for unfrequent words. Conventional approaches such as class-based LMs pre-define some sharing structures (e.g., word classes) to solve the problem. However, defining such structures requires prior knowledge, and the context sharing based on these structures is generally inaccurate. This paper presents a novel similar word model to enhance unfrequent words. In principle, we enrich the context of an unfrequent word by borrowing context information from some "similar words." Compared to conventional class-based methods, this new approach offers a fine-grained context sharing by referring to words that best match the target word, and it is more flexible as no sharing structures need to be defined by hand. Experiments on a large-scale Chinese speech recognition task demonstrated that the similar word approach can improve performance on unfrequent words significantly, while keeping the performance on general tasks almost unchanged.
What problem does this paper attempt to address?