Text Representation Model for Multiple Language Forms in Spoken Chinese Expression

Miao Hu,Junjie Peng,Wenqiang Zhang,Jingxiang Hu,Lizhe Qi,Huanxiang Zhang
DOI: https://doi.org/10.1142/s0218001422530044
IF: 1.261
2022-01-01
International Journal of Pattern Recognition and Artificial Intelligence
Abstract:Mixture of multiple language forms in spoken Chinese is a common but unfavorable issue.. It increases the difficulty of intent understanding and leads to inconvenience for information communication. Existing studies on intent recognition mainly focus on single language form or parallel multilingual language while paying little attention to spoken texts including multiple language forms. In considering that it is hard to capture the semantics of an expression with multiple language forms, it is important to study the problem. To solve this issue, a text representation model for the spoken Chinese expression mixed with English and Chinese Pinyin is proposed. And the feature matrix is built to mine the composition information of English and Pinyin. Besides, the model can efficiently distinguish English from Chinese Pinyin even though both fragments are composed of English letters. Meanwhile, it can effectively process the problem of hidden text information since the problem has been transformed into the Chinese translation task of English and Pinyin. In addition, to verify the performance of the model, the texts processed by this model are used as the input of the classifier. extensive experiments on a large online logistics manual customer service corpus show that this text representation model is correct and effective. It can not only eliminate the obstacles of the mixing of multiple language forms but also bring better results for intent understanding.
What problem does this paper attempt to address?