Automatic expansion of abbreviations in Chinese news text

Fu Guohong,Luke Kang-Kwong,Zhou GuoDong,Xu Ruifeng
DOI: https://doi.org/10.1007/11880592_42
2006-01-01
Abstract:This paper presents an n-gram based approach to Chinese abbreviation expansion. In this study, we distinguish reduced abbreviations from non-reduced abbreviations that are created by elimination or generalization. For a reduced abbreviation, a mapping table is compiled to map each short-word in it to a set of long-words, and a bigram based Viterbi algorithm is thus applied to decode an appropriate combination of long-words as its full-form. For a non-reduced abbreviation, a dictionary of non-reduced abbreviation/full-form pairs is used to generate its expansion candidates, and a disambiguation technique is further employed to select a proper expansion based on bigram word segmentation. The evaluation on an abbreviation-expanded corpus built from the PKU corpus showed that the proposed system achieved a recall of 82.9% and a precision of 85.5% on average for different types of abbreviations in Chinese news text. © Springer-Verlag Berlin Heidelberg 2006.
What problem does this paper attempt to address?