Abstract:From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.

A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging.

Unified Framework of Performing Chinese Word Segmentation and Part-Of-Speech Tagging

A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora.

Capturing Long-distance Dependencies in Sequence Models: A Case Study of Chinese Part-of-speech Tagging.

Towards Accurate and Efficient Chinese Part-of-Speech Tagging.

A Joint Segmenting And Labeling Approach For Chinese Lexical Analysis

Deep Stacking Networks for Low-Resource Chinese Word Segmentation with Transfer Learning

Parsing-based Chinese word segmentation integrating morphological and syntactic information

A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing

A Joint Model for Unsupervised Chinese Word Segmentation.

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

A Discriminative Latent Variable Chinese Segmenter with Hybrid Word/Character Information.

Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging.

A Long Dependency Aware Deep Architecture for Joint Chinese Word Segmentation and POS Tagging.

Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning.

A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Chinese word segmentation as morpheme-based lexical chunking

Joint Chinese Word Segmentation and Span-based Constituency Parsing

Deep Learning for Chinese Word Segmentation and POS Tagging.

Word-based and Character-Based Word Segmentation Models: Comparison and Combination

Joint n-gram Chinese language modeling with an application to Chinese word segmentation