Script Identification in the Wild Via Discriminative Convolutional Neural Network

Baoguang Shi,Xiang Bai,Cong Yao
DOI: https://doi.org/10.1016/j.patcog.2015.11.005
IF: 8
2016-01-01
Pattern Recognition
Abstract:Script identification facilitates many important applications in document/video analysis. This paper investigates a relatively new problem: identifying scripts in natural images. The basic idea is combining deep features and mid-level representations into a globally trainable deep model. Specifically, a set of deep feature maps is firstly extracted by a pre-trained CNN model from the input images, where the local deep features are densely collected. Then, discriminative clustering is performed to learn a set of discriminative patterns based on such local features. A mid-level representation is obtained by encoding the local features based on the learned discriminative patterns (codebook). Finally, the mid-level representations and the deep features are jointly optimized in a deep network. Benefiting from such a fine-grained classification strategy, the optimized deep model, termed Discriminative Convolutional Neural Network (DisCNN), is capable of effectively revealing the subtle differences among the scripts difficult to be distinguished, e.g. Chinese and Japanese. In addition, a large scale dataset containing 16,291 in-the-wild text images in 13 scripts, namely SIW-13, is created for evaluation. Our method is not limited to identifying text images, and performs effectively on video and document scripts as well, not requiring any preprocess like binarization, segmentation or hand-crafted features. The experimental comparisons on the datasets including SIW-13, CVSI-2015 and Multi-Script consistently demonstrate DisCNN a state-of-the-art approach for script identification.
What problem does this paper attempt to address?