Predicting Chinese Abbreviations with Minimum Semantic Unit and Global Constraints.

Longkai Zhang,Li,Houfeng Wang,Xu Sun
DOI: https://doi.org/10.3115/v1/d14-1147
2014-01-01
Abstract:We propose a new Chinese abbreviation prediction method which can incorporate rich local information while generating the abbreviation globally. Different to previous character tagging methods, we introduce the minimum semantic unit, which is more fine-grained than character but more coarse-grained than word, to capture word level information in the sequence labeling framework. To solve the “character duplication” problem in Chinese abbreviation prediction, we also use a substring tagging strategy to generate local substring tagging candidates. We use an integer linear programming (ILP) formulation with various constraints to globally decode the final abbreviation from the generated candidates. Experiments show that our method outperforms the state-of-the-art systems, without using any extra resource.
What problem does this paper attempt to address?