IDENTIFICATION OF CHINESE UNKNOWN WORDS BASED ON GENETIC ALGORITHM

Yan Rong,Zhang Lei
DOI: https://doi.org/10.3969/j.issn.1000-386X.2008.07.036
2008-01-01
Abstract:A new recognition method by genetic algorithm is put forward in this paper against the difficult point of recognition of Chinese unknown words in words segmentation processing.This method expands the segmentation capacity and deals with the unknown words recognition problem as a binary classification problem,that is,after being pre-processed,single character words in segmentation fragments are divided into two categories:'combinable' and 'not combinable'.Genetic algorithm is used to determine the single character words in segmentation fragments first,and then combines the remained adjacent single character words together to complete the recognition of Chinese unknown words.Experimental results show that this method is effective in recognition of Chinese unknown words,and improves the precision rate and recall rate of the recognition.
What problem does this paper attempt to address?