Extraction of Mathematical Expressions in Printed Chinese Technical Documents

ZHANG Zhi-wei,KONG Fan-rang,LIU Wei-lai,LONG Qian,LIU Yong-bin
DOI: https://doi.org/10.3969/j.issn.1003-0077.2007.04.013
2007-01-01
Abstract:Extraction of mathematical expressions is the first step of mathematical expressions recognition.A new approach for separating both isolated and embedded expressions in printed Chinese technical documents is presented.After the features of text lines are extracted,ANFIS is used to classify the text lines into two classes: lines of text and lines of isolated expressions.For embedded expressions,Fuzzy clustering and dynamic programming algorithm are applied to extract Chinese Characters,Chinese punctuations and English letters in sequence.At last,heuristic rules are used to merge mathematics into expressions.The methods proposed are proved to have high accuracy by experiments.
What problem does this paper attempt to address?