Using Uniform-Design GEP for Part-of-Speech Tagging.

Chengyao Lv,Huihua Liu,Yuanxing Dong,Fangyuan Li,Yuan Liang
DOI: https://doi.org/10.1142/S0218126617500608
2017-01-01
Journal of Circuits Systems and Computers
Abstract:In natural language processing (NLP), a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective. This paper proposes a model of uniform-design genetic expression programming (UGEP) for POS tagging. UGEP is used to search for appropriate structures in function space of POS tagging problems. After the evolution of sequence of tags, GEP can find the best individual as solution. Experiments on Brown Corpus show that (1) in closed lexicon tests, UGEP model can get higher accuracy rate of 98.8% which is much better than genetic algorithm model, neural networks and hidden Markov model (HMM) model.; (2) in open lexicon tests, the proposed model can also achieve higher accuracy rate of 97.4% and a high accuracy rate on unknown words of 88.6%.
What problem does this paper attempt to address?