Unsupervised Feature Learning for Chinese Lexicon Based on Auto-Encoder

ZHANG Kaixu,ZHOU Changle
2013-01-01
Abstract:Large-scale unlabeled data contains abundant lexical information for NLP tasks such as Chinese word segmentation and POS tagging.This work extracted high-dimensional distributional lexical information from a largescale unlabeled Chinese corpus.An auto-encoder then performed the unsupervised dimension reduction.The learned low-dimensional lexicon features were used as new lexical features for a joint Chinese word segmentation and POS tagging task.Experiments on the Chinese Treebank 5corpus showed that the additional lexicon features improve the performance and are better than those features learned by using the principal component analysis and the k-means algorithm.
What problem does this paper attempt to address?