Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition
Cong Yao,Dezhi Peng,Zhe Li,Mengchao He,Wentao Yang,Lianwen Jin
DOI: https://doi.org/10.1145/3581783.3612499
2023-10-26
Abstract:Handwritten Mathematical Expression Recognition (HMER) plays a critical role in various applications, such as digitized education and scientific research. Although existing methods have achieved promising performance on publicly available datasets, they still struggle to recognize multi-line mathematical expressions (MEs), suffering from complex structures and slow inference speed. To address these issues, we propose a Line-Aware Semi-autoregressive Transformer (LAST) that treats multi-line mathematical expression sequences as two-dimensional dual-end structures. The proposed LAST utilizes a line-wise dual-end decoding strategy to decode multi-line mathematical expressions in parallel and perform dual-end decoding within each line. Specifically, we introduce a line-aware positional encoding module and a line-partitioned dual-end mask to endow LAST with line order awareness and directionality. Additionally, we adopt a shared-task optimization strategy to train LAST in both autoregressive and semi-autoregressive tasks. To evaluate the effectiveness of our approach in real-world scenarios, we have built a new Multi-line Mathematical Expression dataset (M2E), which, to the best of our knowledge, is the first of its kind and boasts with the largest character category, the largest samples of characters, and the longest average sequence length, compared to existing ME datasets. Experimental results on both the M2E dataset and publicly available datasets demonstrate the effectiveness of our proposed method. Notably, our semi-autoregressive decoding approach achieves significantly faster decoding speeds while still achieving state-of-the-art performance compared to the existing methods.
Computer Science,Mathematics