A method for chinese spelling error correction based on character shapes

XL Lu
2004-01-01
Abstract:The spelling errors in Chinese word often exist in forms of confusing shapes, pronunciations and meanings. Because wrong words and correct words often have similar shapes, the shapes of Chinese characters could be used to find out the correct words. This papers proposed a new method of Chinese words spelling correction based on the shapes of the Chinese characters. The Chinese characters are decomposed into a series of basic radical Chinese characters, or a series of Chinese etymons symbols, which can be coded as a string of Chinese etymons symbols. The etymon symbols are basic units of the words. The similitude of shapes between different words and characters are mapped into the similitude of the series of the etymon symbol. The algorithm for the methods based on the shapes of Chinese characters is investigated in detail. The experiment indicates that the proposed method can find out the most suitable correct Chinese word. It has important practical application value in machine translation and Chinese NLP field.
What problem does this paper attempt to address?