Abstract:The conventional sequence labeling methods for Chinese word segmentation do not fully utilize the linguistic information, which restricts further improvements of the performance. Chinese morphology intensively investigates the constructions and usages of Chinese words, which is helpful to Chinese word segmentation. Furthermore, some word segmentation ambiguities cannot be resolved only by means of the lexical information, and the final disambiguations take place in the parsing process. In this paper, we propose a parsing-based Chinese word segmentation model, which can fully utilize the morphological and syntactic information. Experiments on Penn Chinese Treebank(CTB) 5.0 show that the proposed model obtains competitive performances as the CRFs-based model. To investigate the relationship between our parsing-based model and the CRFs-based model, a maximum entropy model based framework for integrating different knowledge sources is employed. The integrating model obtains an F-measure of 97.9, 25% in segmentation error rate reduction relative to the CRFs-based model, which indicates that the two models are complementary to each other.

Integrating N-gram Model Information for Chinese Word Segmentation Based on Conditional Random Fields.

An Improved Chinese Word Segmentation System with Conditional Random Field

Joint n-gram Chinese language modeling with an application to Chinese word segmentation

CRFs Based Chinese Word Segmentation

Chinese Named Entity Recognition and Word Segmentation Based on Character.

Integrating ngram model and case-based learning for Chinese word segmentation

A morphology-based Chinese word segmentation method

Chinese Word Segmentation with Maximum Entropy and N-gram Language Model

A Hybrid Approach to Chinese Word Segmentation around CRFs

Chinese Chunking Algorithm Based On Conditional Random Fields

Parsing-based Chinese word segmentation integrating morphological and syntactic information

A Chinese Word Segmentation for Statistical Machine Translation

Effective Tag Set Selection In Chinese Word Segmentation Via Conditional Random Field Modeling

Chinese Word Segmentation Based on Mixing Model.

A Local Generative Model For Chinese Word Segmentation

A Joint Model for Unsupervised Chinese Word Segmentation.

Long Short-Term Memory Neural Networks for Chinese Word Segmentation.

Neural Word Segmentation Learning for Chinese

Chinese Word Segmentation and Named Entity Recognition Based on a Context-Dependent Mutual Information Independence Model.

Chinese unknown word recognition using improved conditional random fields

CRF-based Hybrid Model for Word Segmentation, NER and Even POS Tagging