A Text Line Detection Method for Mathematical Formula Recognition

Xiaoyan Lin,Liangcai Gao,Zhi Tang,Josef Baker,Mohamed Alkalai,Volker Sorge
DOI: https://doi.org/10.1109/ICDAR.2013.75
2013-01-01
Abstract:Text line detection is a prerequisite procedure of mathematical formula recognition, however, many incorrectly segmented text lines are often produced due to the two-dimensional structures of mathematics when using existing segmentation methods such as Projection Profiles Cutting or white space analysis. In consequence, mathematical formula recognition is adversely affected by these incorrectly detected text lines, with errors propagating through further processes. Aimed at mathematical formula recognition, we propose a text line detection method to produce reliable line segmentation. Based on the results produced by PPC, a learning based merging strategy is presented to combine incorrectly split text lines. In the merging strategy, the features of layout and text for a text line and those between successive lines are utilised to detect the incorrectly split text lines. Experimental results show that the proposed approach obtains good performance in detecting text lines from mathematical documents. Furthermore, the error rate in mathematical formula identification is reduced significantly through adopting the proposed text line detection method.
What problem does this paper attempt to address?