A Table of Content Recognition Method of Book Documents Based on Clustering Techniques

Liangcai Gao,Zhi Tang,LIN Xiaofan,YU Yinyan,FANG Jing
2010-01-01
Abstract:After reviewing the merits and drawbacks of the existing ToC(table of contents) recognition methods,the authors describe an automatic ToC recognition method with high efficiency and adaptability.Based on style consistency of ToC in book documents,this method employs clustering to detect decorative elements and to generate an adaptive ToC model which can be used to extract ToC entries and their hierarchies.Experimental results show that this method achieves high accuracy and efficiency.Especially,it performs well in processing complicated ToC with decorative elements,broken lines and various hierarchical structures.This method has been successfully applied in a commercial E-book production line.
What problem does this paper attempt to address?