Abstract:From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.

Experimental Study of Hidden Markov Model Based Part-of-speech Tagging for Chinese Texts

Incorporating External POS Tagger for Punctuation Restoration

Towards Accurate and Efficient Chinese Part-of-Speech Tagging.

A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora.

Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features

Hybrid Chinese Text Chunking

Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning.

Combining Context Features by Canonical Belief Network for Chinese Part-Of-Speech Tagging.

The Effect of Part-Of-Speech on Mandarin Speech Recognition

Quality Assurance Of Automatic Annotation Of Very Large Corpora: A Study Based On Heterogeneous Tagging Systems

Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging.

Unified Framework of Performing Chinese Word Segmentation and Part-Of-Speech Tagging

Attention-based BILSTM network with part-of-speech features for Chinese text classification

Exploring Multiple Features for POS Guessing of Chinese Unknown Words with Maximum Entropy Models

A Chinese Part-of-speech Tagging Approach Using Conditional Random Fields

TPOS Tagging Method Based on BiLSTM_CRF Model

Segmentation and Tagging Ambiguity for Chinese Using HMM

Vietnamese Part of Speech Tagging Based on Multi-category Words Disambiguation Model.

CSeg& Tag1.0

Research on Deep Processing Technologies for Large-Scale Corpora

Universal Semantic Tagging for English and Mandarin Chinese.