Resolving Coordinate Structures for Chinese Constituent Parsing.

Yichu Zhou,Shujian Huang,Xinyu Dai,Jiajun Chen
DOI: https://doi.org/10.1007/978-3-319-25207-0_30
2015-01-01
Abstract:Coordinate structures are linguistic structures consisting of two or more conjuncts, which usually compose into larger constituent as a whole unit. However, the boundary of each conjunct is difficult to identify, which makes it difficult to parse the whole coordinate and larger structures. In labeled data, such as the Penn Chinese Tree Bank CTB, coordinate structures are not labeled explicitly, which makes solving the problem more complicated. In this paper, we treat resolving coordinate structures as an independent sub-problem of parsing. We first define coordinate structures explicitly and design rules to extract the coordinate structures from labeled CTB data. Then a specifically designed grammar is proposed for automatic parsing of coordinate structures. We propose two groups of new features to better model coordinate structures in a shift-reduce parsing framework. Our approach can achieve a $$15\\%$$ improvement in F-1 score on resolving coordinate structures.
What problem does this paper attempt to address?