Chinese Abbreviation Identification Using Abbreviation-Template Features and Context Information

Xu Sun,Houfeng Wang
DOI: https://doi.org/10.1007/11940098_26
2006-01-01
Abstract:Chinese abbreviations are frequently used without being defined, which has brought much difficulty into NLP. In this study, the definition-independent abbreviation identification problem is proposed and resolved as a classification task in which abbreviation candidates are classified as either, 'abbreviation' or 'non-abbreviation' according to the posterior probability. To meet our aim of identifying new abbreviations from existing ones, our solution is to add generalization capability to the abbreviation lexicon by replacing words with word classes and therefore create abbreviation-templates. By utilizing abbreviation-template features as well as context information, a SVM model is employed as the classifier. The evaluation on a raw Chinese corpus obtains an encouraging performance. Our experiments further demonstrate the improvement after integrating with morphological analysis, substring analysis and person name identification.
What problem does this paper attempt to address?