Handwritten Mathematical Expression Recognition: An approach on data augmentation

Khanh-Ngoc Bui,Quoc-Kim-Hoang Nguyen,Thanh-Sach Le
DOI: https://doi.org/10.1109/acomp53746.2021.00013
2021-11-01
Abstract:In this paper, we propose an approach for generating Mathematical Expression (ME) images from the CROHME dataset. Our approach employs two methods. The first method transforms original ME images which belong to the CROHME dataset by geometric transformations. The second generates new ME images based on the dictionary of character patterns collected from the CROHME dataset. The generated ME images follow rules of math form. Based on the combination of both two methods, we introduce a much larger dataset for handwritten math expression recognition problem compared to original CROHME. That is the main contribution of this paper. To evaluate, we employ a sequential system containing a module for object detection- Single Shot MultiBox Detector (SSD)- and a module for parsing SSD’s outcome into $mathrm{L}^{A} mathrm{T}_{E}mathrm{X}$ string- DRACULAE and focus on improving the detector. We trained and evaluated the system on CROHME 2013 training set combined with and without our own generated dataset to point out the impact of our generative approach. The experimental results indicate that the detector achieves 52.57% on mAP instead of 36.98% if we do not use added dataset.
What problem does this paper attempt to address?