Offline handwritten mathematical expression recognition with graph encoder and transformer decoder

Jia-Man Tang,Hong-Yu Guo,Jin-Wen Wu,Fei Yin,Lin-Lin Huang
DOI: https://doi.org/10.1016/j.patcog.2023.110155
IF: 8
2023-11-29
Pattern Recognition
Abstract:Handwritten mathematical expression recognition (HMER) has attracted extensive attention. Despite the significant progress achieved in recent years attributed to the development of deep learning approaches, HMER remains a challenge due to the complex spatial structure and variable writing styles. Encoder–decoder models with attention mechanism, which treats HMER as an image-to-sequence (i.e. LaTeX) generation task, have boosted the accuracy, but suffer from low interpretability in that the symbols are not segmented explicitly. Symbol segmentation is desired for facilitating post-processing and human interaction in real applications. In this paper, we formulate the mathematical expression as a graph and propose a Graph-Encoder-Transformer-Decoder (GETD) approach for HMER. For constructing the graph from input image, candidate symbols are first detected using an object detector and represented as the nodes of a graph, called symbol graph, and the edges of the graph encodes the between-symbol relationship. The spatial information is aggregated in a graph neural network (GNN), and a Transformer-based decoder is used to identify the symbol classes and structure from the graph. Experiments on public datasets demonstrate that our GETD model achieves competitive expression recognition performance while offering good interpretability compared with previous methods.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?