Distinguishing Text/Non-Text Natural Images With Multi-Dimensional Recurrent Neural Networks

Pengyuan Lyu,Baoguang Shi,Chengquan Zhang,Xiang Bai
DOI: https://doi.org/10.1109/ICPR.2016.7900256
2016-01-01
Abstract:In this paper, we focus on the text/non-text classification problem: distinguishing images that contain text from a lot of natural images. To this end, we propose a novel neural network architecture, termed Convolutional Multi-Dimensional Recurrent Neural Network (CMDRNN), which distinguishes text/non-text images by classifying local image blocks, taking both region pixels and dependencies among blocks into account. The network is composed of a Convolutional Neural Network (CNN) and a Multi-Dimensional Recurrent Neural Network (MDRNN). The CNN extracts rich and high-level image representation, while the MDRNN analyzes dependencies along multiple directions and produces block-level predictions. By evaluating CMDRNN on a public dataset, we observe improvements over prior arts in terms of both speed and accuracy.
What problem does this paper attempt to address?