A Handwriting Textline Extraction Approach Based on Connected Domain

Wei Gao,Fuchun Sun,Zhonghang Yin
DOI: https://doi.org/10.1109/coginf.2010.5599738
2010-01-01
Abstract:This paper describes an approach for extracting words, textlines and text blocks by analyzing the spatial configuration of connected domain and word contour rectangles on a given document image. The basic idea is that connected components of black pixels and contours can be used as computational units in document image analysis. In this paper, we try to find a spatial feature and overlapped relationships for every contour rectangle, and we call this feature rectangle “Standard Rectangle”(SR). Then we calculate the split line of every textline according to a series of operations of SRs, and separate the word contour rectangles to different lines. In the next step we estimate that if the adjacent textlines is overlapped. If it is, we calculate the overlap distance and move the word contour rectangles according to it. Our experiment show the approach does good work on both overlapped textlines and detached textlines.
What problem does this paper attempt to address?