An Automatic Performance Evaluation Method for Document Page Segmentation

liangrui peng,ming chen,changsong liu,xiaoqing ding,jirong zheng
DOI: https://doi.org/10.1109/ICDAR.2001.953769
2001-01-01
Abstract:Abstract: Automatic performance evaluation for document page segmentation module is necessary, as OCR products are used to manipulate large scale of documents with complex layout, especially for newspapers. This paper presents a novel region-based method to evaluate the performance of page segmentation module by analyzing geometric region relationships between the segmentation results and the preset ground-truth. The ground-truth is not only the correct answer to page segmentation, but also the comparison benchmark of the automatic evaluation, so it has more restricted geometric constraints. The region-matching algorithm is realized by searching the equal region in the segmentation results for each region in the ground-truth. The performance parameters are calculated based on the matching results. An experiment is given to test two page segmentation modules in a popular Chinese OCR product - THOCR2000, and the results show this method is effective.
What problem does this paper attempt to address?