An Approach to Script Identification in Multi-language Text Image

Mingji Piao,Rongyi Cui
DOI: https://doi.org/10.1109/icinis.2013.70
2013-01-01
Abstract:A character level script identification method to identify Korean, Chinese and English scripts using PCA is proposed in this paper. First, the space of eigenvectors was constructed by using PCA, and the segmented character was reconstructed by projecting the character into the space. Second, relative entropy between original and reconstructed image is computed for vertical and horizontal histogram. Finally, the written language was identified according to Euclidean distance and relative entropy between original and reconstructed image. The experiment results show that proposed method achieved 99.78% high accuracy for correct segmentation which effectively solved the script identification problem for multi-language text image contains Korean, Chinese and English.
What problem does this paper attempt to address?