Script independent text segmentation of document images using graph network based shortest path scheme

Parul Sahare,Jitendra V. Tembhurne,Mayur R. Parate,Tausif Diwan,Sanjay B. Dhok
DOI: https://doi.org/10.1007/s41870-023-01230-w
2023-03-25
International Journal of Information Technology
Abstract:Document image processing is one of the growing research fields in the digital world for applications like data base indexing, text recognition, signature verification, web-searching engines, etc. Segmenting intermixed texts (handwritten and machine-printed) from documents is a difficult task. In this paper, script independent text-line and word segmentation techniques are proposed. For text-line segmentation, Dijkstra’s algorithm is employed, whereas for segmenting words, wavelet transform is used. Text-line segmentation is modeled as a general image segmentation task. Dijkstra’s algorithm is a shortest path planning method, which is utilized for boundary growing process. This forms potential text-line boundary regions. For word segmentation, energy map is calculated first using wavelet transform and further, Gaussian filter is used for text-blocks creation. Proposed techniques are evaluated on different databases contain documents of different scripts. Benchmarking analysis is performed with other approaches where highest segmentation accuracies of 97.6% and 98.1% are obtained by text-line and word segmentation techniques, respectively.
What problem does this paper attempt to address?