Automatic Style Clustering of Printed Characters in Form Images.

CS Liu,XQ Ding
DOI: https://doi.org/10.1117/12.588138
2005-01-01
Abstract:Style is an important feature of printed or handwritten characters. But it is not studied thoroughly compared with character recognition. In this paper, we try to learn how many typical styles exist in a kind of real world form images. A hierarchical clustering method has been developed and tested. A cross recognition error rate constraint is proposed to reduce the false combinations in the hierarchical clustering process, and a cluster selecting method is used to delete redundant or unsuitable clusters. Only a similarity measure between any patterns is needed by the algorithm. It is tested on a template matching based similarity measure which can be extended to any other feature and distance measure easily. The detailed comparing on every step’s effects is shown in the paper. Total 16 kinds of typical styles are found out, and by giving each character in each style a prototype for recognition, a 0.78% error rate is achieved by recognizing the testing set.
What problem does this paper attempt to address?