Abstract:Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios, such as digitized education and automated offices. Recently, sequence-based models with encoder-decoder architectures have been commonly adopted to address this task by directly predicting LaTeX sequences of expression images. However, these methods only implicitly learn the syntax rules provided by LaTeX, which may fail to describe the position and hierarchical relationship between symbols due to complex structural relations and diverse handwriting styles. To overcome this challenge, we propose a position forest transformer (PosFormer) for HMER, which jointly optimizes two tasks: expression recognition and position recognition, to explicitly enable position-aware symbol feature representation learning. Specifically, we first design a position forest that models the mathematical expression as a forest structure and parses the relative position relationships between symbols. Without requiring extra annotations, each symbol is assigned a position identifier in the forest to denote its relative spatial position. Second, we propose an implicit attention correction module to accurately capture attention for HMER in the sequence-based decoder architecture. Extensive experiments validate the superiority of PosFormer, which consistently outperforms the state-of-the-art methods 2.03%/1.22%/2.00%, 1.83%, and 4.62% gains on the single-line CROHME 2014/2016/2019, multi-line M2E, and complex MNE datasets, respectively, with no additional latency or computational cost. Code is available at <a class="link-external link-https" href="https://github.com/SJTU-DeepVisionLab/PosFormer" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper aims to address two major challenges in Handwritten Mathematical Expression Recognition (HMER): the complexity of relationships between symbols and the diversity of handwriting input styles. Specifically, existing methods fall short in handling complex structural relationships and diverse handwriting styles, especially when directly predicting LaTeX sequences. Implicitly learning grammar rules leads to an inability to accurately describe the positional and hierarchical relationships between symbols. To overcome these challenges, the authors propose a new method called **Positional Forest Transformer (PosFormer)**. This method improves the HMER task through the following two key steps: 1. **Positional Forest Encoding**: Encode the LaTeX mathematical expression sequence into a positional forest structure, where each symbol is assigned a positional identifier to represent its relative spatial position in the 2D image. This approach helps parse the nested hierarchy and relative positions between symbols, thereby aiding in the symbol-level feature representation learning of complex nested mathematical expressions. 2. **Implicit Attention Refinement Module**: Introduce an implicit attention refinement module in the attention-based decoder architecture. By adaptively incorporating zero attention as a refinement term, it utilizes past alignment information to refine attention weights, thereby improving recognition accuracy. Experimental results show that PosFormer significantly outperforms existing methods on multiple public datasets, particularly on single-line CROHME2014/2016/2019 and multi-line M2E datasets, with an average gain of 4.62% on the complex MNE dataset. Additionally, this method does not require extra annotation work and does not add latency or computational cost during the inference phase.

PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer

Improving Handwritten Mathematical Expression Recognition Via Similar Symbol Distinguishing

Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition

Improving Handwritten Mathematical Expression Recognition via Integrating Convolutional Neural Network With Transformer and Diffusion-Based Data Augmentation

Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition

Tree-based Data Augmentation and Mutual Learning for Offline Handwritten Mathematical Expression Recognition

Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition

Offline handwritten mathematical expression recognition with graph encoder and transformer decoder

Multi-modal Attention Network for Handwritten Mathematical Expression Recognition.

Viewing Writing As Video: Optical Flow Based Multi-Modal Handwritten Mathematical Expression Recognition

A tree-based model with branch parallel decoding for handwritten mathematical expression recognition

Handwritten Mathematical Expression Recognition via Attention Aggregation Based Bi-directional Mutual Learning

TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition

Relative Position Embedding Asymmetric Siamese Network for Offline Handwritten Mathematical Expression recognition.

NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition

SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder

Watch, Attend and Parse: an End-to-end Neural Network Based Approach to Handwritten Mathematical Expression Recognition.

On-line Handwritten Mathematical Expression Recognition Method Based on Statistical and Semantic Analysis

Semantic Graph Representation Learning for Handwritten Mathematical Expression Recognition

Symbol Location-Aware Network for Improving Handwritten Mathematical Expression Recognition

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition.