Abstract:Dependency parsing has gained more and more interest in natural language processing in recent years due to its simplicity and general applicability for diverse languages. Previous work demonstrates that part-of-speech (POS) is an indispensable feature in dependency parsing since pure lexical features suffer from serious data sparseness problem. However, due to little morphological changes, Chinese POS tagging has proven to be much more challenging than morphology-richer languages such as English (94% vs. 97% on POS tagging accuracy). This leads to severe error propagation for Chinese dependency parsing. Our experiments show that parsing accuracy drops by about 6% when replacing manual POS tags of the input sentence with automatic ones generated by a state-of-the-art statistical POS tagger. To address this issue, this paper proposes a solution by jointly optimizing POS tagging and dependency parsing in a unique model. We propose for our joint models several dynamic programming based decoding algorithms which can incorporate rich POS tagging and syntactic features. Then we present an effective pruning strategy to reduce the search space of candidate POS tags, leading to significant improvement of parsing speed. Experimental results on two Chinese data sets, i.e. Penn Chinese Treebank 5.1 and Penn Chinese Treebank 7, demonstrate that our joint models significantly improve both the state-of-the-art tagging and parsing accuracies. Detailed analysis shows that the joint method can help resolve syntax-sensitive POS ambiguities {ssrNN,ssrVV}. In return, the POS tags become more reliable and helpful for parsing since the syntactic features are used in POS tagging. This is the fundamental reason for the performance improvement.

Bilingually-constrained (monolingual) Shift-Reduce Parsing

Introducing more features to improve Chinese shift-reduce parsing

Improving Shift‐Reduce Phrase‐Structure Parsing with Constituent Boundary Information

Improving shift-reduce constituency parsing with large-scale unlabeled data.

Exploiting Lexical Dependencies from Large-Scale Data for Better Shift-Reduce Constituency Parsing.

Fast and Accurate Shift-Reduce Constituent Parsing.

A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Exploiting Multiple Treebanks for Parsing with Quasi-Synchronous Grammars

Cross-Lingual Universal Dependency Parsing Only from One Monolingual Treebank

Enhancing Shift-Reduce Constituent Parsing with Action N-Gram Model.

Single-Model System Combination for Shift-Reduce Parser

Cross Language Dependency Parsing Using a Bilingual Lexicon.

Cross-Lingual Dependency Parsing by POS-Guided Word Reordering.

Shift-Reduce Constituent Parsing with Neural Lookahead Features.

Bilingually Induced Clause Parser for Tree-based Translation

Joint Optimization for Chinese POS Tagging and Dependency Parsing

Building Powerful Dependency Parsers for Resource-Poor Languages

Joint Parsing and Translation

Joint syntactic and semantic parsing of Chinese

Incremental Parsing with Minimal Features Using Bi-Directional LSTM

High-order Joint Constituency and Dependency Parsing