Review of Text Extraction Algorithms for Scene-text and Document Images

Parul Sahare,Sanjay B. Dhok
DOI: https://doi.org/10.1080/02564602.2016.1160805
2016-04-15
IETE Technical Review
Abstract:One of the major applications of text retrieval from images is to extract the text information and then recognize its characters. This is helpful for indexing the images within storage media. When we want to search a particular image or document, there is no need to go through a large bunch of images. We go only through the group of indexed images, so that the task of finding the particular image becomes easy. Extracting text lines from scanned document images present a major problem in optical character recognition process as skewed text lines raise the complexity. The problem gets even worse with the text lines of different orientations. Such lines are called as multi-skewed lines. These multi-skewed lines are easily observed in both printed and handwritten documents. It is a challenging task to design a real time system, which can maintain a high recognition rate with good accuracy and is independent of the type of documents and character fonts. In this paper, we attempt to analyze and classify the various text extraction schemes for the scene-text and document images. We also compare different approaches of these images based on common problems and discuss their merits and demerits.
telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?