A Bilingual Corpus Based Approach to Chinese Abbreviation Extraction

LIU Youqiang,LI Bin,XI Ning,CHEN Jiajun
DOI: https://doi.org/10.3969/j.issn.1003-0077.2012.02.013
2012-01-01
Abstract:Chinese abbreviations are widely used in modern Chinese texts,and the researches on them are important for Chinese information processing.In this paper,we propose an approach to extract Chinese abbreviations from Chinese-English parallel corpus.First we generate word alignments for the corpus,and extract Chinese-English phrase pairs consistent with the alignments.Then,we discriminate high quality phrase pairs from the bad ones by SVM Classifier.In the end,we extract Chinese abbreviation and full-form phrase pairs from the high quality group using their corresponding English translations and some rules.The experiments show that our approach can extract abbreviations with high accuracy,and could be an effective way to extract Chinese abbreviation and full-form phrase pairs.
What problem does this paper attempt to address?