Fast Chinese Calligraphic Character Recognition with Large-Scale Data
Gao Pengcheng,Wu Jiangqin,Lin Yuan,Xia Yang,Mao Tianjiao
DOI: https://doi.org/10.1007/s11042-014-1969-3
IF: 2.577
2014-01-01
Multimedia Tools and Applications
Abstract:Chinese calligraphy draws a lot of attention for its beauty and elegance. But due to the complexity of shape and styles of calligraphic characters, it is difficult for common users to recognize them. Thus it would be great if a tool is provided to help users to recognize the unknown calligraphic characters. The well-known OCR (Optical Character Recognition) technology can hardly help people to recognize the unknown characters because of their deformation and complexity. In CADAL, a Calligraphic Character Dictionary (CalliCD) which contains character images labeled with semantic meaning has been constructed and provided to common users to use online. With the help of CalliCD, user can learn more about the unknown calligraphic character by performing similarity based searching. But as with the growth of CalliCD, it takes intolerable time to do the similarity based one-to-one searching. Strategies that can handle large scale data are needed. In this paper, a fast recognition schema based on retrieval is proposed. In addition, a novel shape descriptor, called GIST-SC, is proposed to represent calligraphic character image for efficient and effective retrieval. The schema works in three steps. Firstly approximate nearest neighbors of the character image to be recognized are found quickly. Secondly, one-to-one fine matching between approximate nearest neighbors and the character image to be recognized is performed. Finally the recognition based on semantic probability is given. Our experiments show that the GIST-SC descriptor and the recognition schema are efficient and effective for Chinese calligraphic character recognition with CalliCD.