A Connected Components Based Layout Analysis Approach for Educational Documents

Ruiying Liu,Shenbao Yu,Fan Yang,Yinghui Pan,Yifeng Zeng
DOI: https://doi.org/10.1109/iccse51940.2021.9569699
2021-01-01
Abstract:Layout analysis, which aims to detect and categorize areas of interest on document images, is an increasingly important part in document image processing. Existing researches have conducted layout analysis on various documents, but none has been proposed for documents yielded from teaching, i.e. exam papers and workbooks, which are worth studying. In this paper, we propose a novel layout analysis system to achieve two tasks for workbook pages and exam papers respectively. On one hand, we segment text and non-text areas of workbook pages. On the other hand, we extract regions of interest on exam papers. Our system is based on connected component (CC) analysis, specifically, it extracts geometric features and spatial information of CCs to recognize page elements. We carried out experiments on images collected from real-world scenarios, and promising results confirmed the applicability and effectiveness of our system.
What problem does this paper attempt to address?