AN AUTOMATIC TEXT EXTRACTION METHOD BASED ON STATE MACHINE

Zhu Zhenguang,He Hui,Zhang Hongli,Li Qiao
DOI: https://doi.org/10.3969/j.issn.1000-386x.2012.12.016
2012-01-01
Abstract:Nowadays,retrieving text contents from different format documents becomes a hot topic in internet researches.For the purpose of fetching text contents from documents as fast as possible,providing basic data for content retrieval and improving the overall efficiency of searching,an automatic text extraction method based on state machine has been put forward in this paper after analysing the Microsoft Office 2007 document format.Experiments show that the method proposed in the paper achieves the goal of good effect on text extraction in its correctness,memory cost and time cost.
What problem does this paper attempt to address?