Visual-Textual Attention for Tree-Based Handwritten Mathematical Expression Recognition

Wei Liao,Jiayi Liu,Jianghan Chen,Qiu-Feng Wang,Kaizhu Huang
DOI: https://doi.org/10.1007/978-981-97-1417-9_35
2024-01-01
Abstract:Handwritten mathematical expression recognition (HMER) has attracted much attention and achieved remarkable progress under the encoder-decoder framework. However, it is still challenging due to complex structures and illegible handwriting. In this paper, we propose to refine the encoder-decoder framework for HMER. Firstly, we propose a multi-scale vision and textual attention fusion mechanism to enhance the contexts from both spatial and semantic information. Next, most of HMER works simply regard the HMER as a sequence-to-sequence problem (i.e., Latex string), ignoring the structure information in the mathematical expressions. To overcome this issue, we utilize a tree decoder to capture such structure contexts. Furthermore, we propose a parent-children mutual learning method to enhance the learning of our encoder-decoder model. Extensive experiments on the HMER benchmark datasets of CROHME 2014, 2016 and 2019 demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?