Abstract:Handwritten mathematical expression recognition (HMER) poses a formidable challenge due to the intricate two-dimensional structures and diverse handwriting styles. This paper introduces a novel approach to improve HMER accuracy by employing an integrated, high-capacity architecture that combines Transformer and Convolutional Neural Network (CNN) models, along with a denoising diffusion probabilistic model (DDPM)-based data augmentation technique. We explore three combination strategies for an attention-based encoder-decoder (AED) HMER model: 1) The "Tandem" strategy, which harnesses CNN features within a Transformer encoder to capture global interdependencies; 2) The "Parallel" strategy, which integrates Transformer encoder outputs with CNN outputs to achieve comprehensive feature fusion; 3) The "Mixing" strategy, which introduces multi-head self-attention (MHSA) at the final stage of the CNN. We evaluate these methods using the CROHME benchmark dataset and conduct a detailed comparative analysis. All three approaches significantly enhance model performance. Notably, the "Tandem" approach achieves expression recognition rates (ExpRate) of 54.85% and 58.56% on the CROHME 2016 and 2019 test sets, respectively, while the "Parallel" method attains 55.63% and 57.39% on the same test sets. Furthermore, we introduce an innovative data augmentation approach that utilizes DDPM to generate synthetic training samples. The DDPM, conditioned on LaTeX-rendered images, bridges the gap between printed and handwritten expressions, enabling the creation of realistic, stylistically diverse handwriting samples. This augmentation boosts the ExpRates of all strategies on both CROHME 2016 and 2019 test sets, yielding improvements of 1.6-4.6% relative to the unaugmented dataset.

Multi-modal Attention Network for Handwritten Mathematical Expression Recognition.

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition.

Handwritten Mathematical Expression Recognition via Attention Aggregation Based Bi-directional Mutual Learning

Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition

Watch, Attend and Parse: an End-to-end Neural Network Based Approach to Handwritten Mathematical Expression Recognition.

A GRU-based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition

NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition

Improving Attention-Based Handwritten Mathematical Expression Recognition with Scale Augmentation and Drop Attention

Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition

Improving Handwritten Mathematical Expression Recognition via Integrating Convolutional Neural Network With Transformer and Diffusion-Based Data Augmentation

DGNet: A Handwritten Mathematical Formula Recognition Network Based on Deformable Convolution and Global Context Attention

Viewing Writing As Video: Optical Flow Based Multi-Modal Handwritten Mathematical Expression Recognition

Improving Handwritten Mathematical Expression Recognition Via Similar Symbol Distinguishing

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition

Symbol Location-Aware Network for Improving Handwritten Mathematical Expression Recognition

Stroke Based Posterior Attention for Online Handwritten Mathematical Expression Recognition.

PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer

Offline handwritten mathematical expression recognition with graph encoder and transformer decoder

Semantic Graph Representation Learning for Handwritten Mathematical Expression Recognition

Relative Position Embedding Asymmetric Siamese Network for Offline Handwritten Mathematical Expression recognition.