Pinyin-indexed Method for Approximate Matching in Chinese

Fang Zheng
2009-01-01
Abstract:The exact matching of is key to popular commercial search engines.A Chinese approximate matching method with an index structure was developed to achieve better retrieval when the input contains errors.Three types of similarity measurement between two Chinese strings were developed based on the character edit-distance,the Pinyin edit-distance and the Pinyin improved edit-distance.The similarity measurements were used to expand the user's query so that the approximate matching task can be represented as several exact matching sub-tasks.The results of these exact matchings are merged and sorted by their similarity to the original query.Tests on a webpage text database gave a 50.4% recall rate with the Pinyin improved edit-distance with a 60.4% precision with a small increase in time and space complexity.
What problem does this paper attempt to address?