A novel approach to auto-detection of header and footer

Xuefang Wang,Wujun Yan
2010-01-01
Journal of Computational Information Systems
Abstract:There is much helpful information in page header and footer for document analysis and understanding such as text reflowing, table of content recognition and chapter delimitation. After reviewing the merits and drawbacks of the existing header and footer detection methods, this paper presents a new header and footer detection approach of book documents basing on page association. MFONT which is the font most frequently used in a book is extracted at first in the method. And the method calculates the similarities between the corresponding text lines from deferent pages of a book, in the aspects of position, font and text content. Finally, the method further considers the position of a line and the fonts used in the line to determine whether the line belongs to header or footer. The test results demonstrate that this method achieves very high detection rate and performs better than the existing header and footer detection methods. Copyright © 2010 Binary Information Press.
What problem does this paper attempt to address?