The Research of Chinese Word Segmentation Algorithm Based on Forward Maximum Match

Long Hua
Abstract:The Chinese automatic word segmentation is always one key component in many fields of Chinese information processing,the Web documents mining and so on..The Chinese word segmentation algorithm is one of the cores.Forward maximum matching(FMM) algorithm is fast,simple,easy to implement,but there is a problem in forward maximum matching(FMM) algorithm that the initial value of the maximum word-length is immovable,this might lead to the longer words can be matched repeatedly.Aiming at this problem,this paper puts forward an idea for improving FMM algorithm that is to assign the maximum text-length to be treated dynamically based on the word-length in Chinese word segmentation word bank.Finally,through experiments conducted on the word algorithm and validation.Compared with normal FMM,the accuracy of Chinese word segmentation improves.
Computer Science
What problem does this paper attempt to address?