Abstract:Dependency parsing has gained more and more interest in natural language processing in recent years due to its simplicity and general applicability for diverse languages. Previous work demonstrates that part-of-speech (POS) is an indispensable feature in dependency parsing since pure lexical features suffer from serious data sparseness problem. However, due to little morphological changes, Chinese POS tagging has proven to be much more challenging than morphology-richer languages such as English (94% vs. 97% on POS tagging accuracy). This leads to severe error propagation for Chinese dependency parsing. Our experiments show that parsing accuracy drops by about 6% when replacing manual POS tags of the input sentence with automatic ones generated by a state-of-the-art statistical POS tagger. To address this issue, this paper proposes a solution by jointly optimizing POS tagging and dependency parsing in a unique model. We propose for our joint models several dynamic programming based decoding algorithms which can incorporate rich POS tagging and syntactic features. Then we present an effective pruning strategy to reduce the search space of candidate POS tags, leading to significant improvement of parsing speed. Experimental results on two Chinese data sets, i.e. Penn Chinese Treebank 5.1 and Penn Chinese Treebank 7, demonstrate that our joint models significantly improve both the state-of-the-art tagging and parsing accuracies. Detailed analysis shows that the joint method can help resolve syntax-sensitive POS ambiguities {ssrNN,ssrVV}. In return, the POS tags become more reliable and helpful for parsing since the syntactic features are used in POS tagging. This is the fundamental reason for the performance improvement.

Active Learning for Chinese Dependency Parsing

Chinese Dependency Parsing Based on Treebank

Active Learning for Dependency Parsing with Partial Annotation.

A Statistical Dependency Parser of Chinese under Small Training Data *

Diversity-Aware Batch Active Learning for Dependency Parsing

Iterative Integration of Unsupervised Features for Chinese Dependency Parsing

A Practical Chinese Dependency Parser Based on A Large-scale Dataset

Using Short Dependency Relations from Auto-Parsed Data for Chinese Dependency Parsing

A Study Oil Constituent-to-Dependency Conversion

Active Learning for Chinese Word Segmentation on Judgements.

Building Powerful Dependency Parsers for Resource-Poor Languages

A Pilot Study on Dialogue-Level Dependency Parsing for Chinese

Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing

Dependency Parsing with Noisy Multi-annotation Data

Chinese Statistical Parsing with Rich Linguistic Features

Training Dependency Parsers with Partial Annotation.

Ungreedy methods for Chinese deterministic dependency parsing

Probabilistic Models for Action-Based Chinese Dependency Parsing

A Separately Passive-Aggressive Training Algorithm for Joint POS Tagging and Dependency Parsing

Joint Optimization for Chinese POS Tagging and Dependency Parsing

Cross-Lingual Universal Dependency Parsing Only from One Monolingual Treebank