Hierarchical Logical Structure Extraction of Book Documents by Analyzing Tables of Contents

F He,XQ Ding,LR Peng
DOI: https://doi.org/10.1117/12.528808
2003-01-01
Abstract:Logical structure extraction of book documents is significant in electronic document database automatic construction. The tables of contents in a book play an important role in representing the overall logical structure and reference information of the book documents. In this paper, a new method is proposed to extract the hierarchical logical structure of book documents, in addition to the reference information, by combining spatial and semantic information of the tables of contents in a book. Experimental results obtained from testing on various book documents demonstrate the effectiveness and robustness of the proposed approach.
What problem does this paper attempt to address?