Multi-modal Attention Network for Handwritten Mathematical Expression Recognition.

Jia-Ming Wang,Jun Du,Jianshu Zhang,Zi-Rui Wang
DOI: https://doi.org/10.1109/ICDAR.2019.00191
2019-01-01
Abstract:In this paper, we propose a novel multi-modal attention network (MAN), which is based on encoder-decoder framework, for handwritten mathematical expression recognition (HMER). Here, multi-modal means two specific modalities: online and offline, where online modality employs dynamic trajectories as input and offline modality employs static images as input. More specifically, the proposed method first feeds dynamic trajectories and static images into online and offline channels of the multi-modal encoder respectively. The output of the encoder is then transferred to the multi-modal decoder to generate a LaTeX sequence as the mathematical expression recognition result. To make full use of the complementary information that comes from the two modalities, we propose a re-attention mechanism as an enhanced version of the multi-modal attention mechanism which can further improve the recognition performance. Evaluated on a benchmark published by CROHME competition, the proposed approach achieves an expression recognition accuracy of 54.05% on CROHME 2014 and 50.56% on CROHME 2016 which substantially outperforms the state-of-the-arts using the single online or offline modality.
What problem does this paper attempt to address?