Skew Detection For Complex Document Images Using Robust Borderlines In Both Text And Non-Text Regions

Hong Liu,Qi Wu,Hongbin Zha,Xueping Liu
DOI: https://doi.org/10.1016/j.patrec.2008.06.008
IF: 4.757
2008-01-01
Pattern Recognition Letters
Abstract:A new skew detection method for complex document images based on robust borderlines extracted from both text and non-text regions is proposed in this paper. First, borderlines are extracted from the borders of large connected components in a document image by using a run length based method. Second, after filtering out non-linear borderlines, a fast iteration algorithm is applied to optimize each linear borderline's directional angle. Finally, the weighted median value of all the directional angles is calculated as the skew angle of the whole document. Experiments on 2000 various skew document images are implemented. Total correct rate is 95.2%, and the detecting time on average is less than 0.2 s for each document. The proposed skew detection method is efficient for complex documents with horizontal and vertical text layout, three kinds of linguistic characters in English, Japanese and Chinese, especially for documents with predominant non-text regions or sparse text regions. (C) 2008 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?