Image-to-LaTeX Converter for Mathematical Formulas and Text

Daniil Gurgurov,Aleksey Morshnev

2024-08-08

Abstract:In this project, we train a vision encoder-decoder model to generate LaTeX code from images of mathematical formulas and text. Utilizing a diverse collection of image-to-LaTeX data, we build two models: a base model with a Swin Transformer encoder and a GPT-2 decoder, trained on machine-generated images, and a fine-tuned version enhanced with Low-Rank Adaptation (LoRA) trained on handwritten formulas. We then compare the BLEU performance of our specialized model on a handwritten test set with other similar models, such as Pix2Text, TexTeller, and Sumen. Through this project, we contribute open-source models for converting images to LaTeX and provide from-scratch code for building these models with distributed training and GPU optimizations.

Computation and Language,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to develop an image-to-LaTeX converter named Im2Latex, which is used to convert images containing mathematical formulas and text into LaTeX code. The paper mainly addresses the following issues: 1. **Proposed a new image-to-LaTeX conversion method**: The authors utilized an encoder based on Swin Transformer and a GPT-2 decoder to construct a visual encoder-decoder model, achieving effective conversion from images to LaTeX code. 2. **Solved the problem of recognizing complex mathematical formulas**: By adopting Swin Transformer as the encoder, the model can effectively handle images containing complex mathematical formulas, thereby improving recognition accuracy. 3. **Achieved recognition of both printed and handwritten formulas**: First, a base model is trained to handle mathematical formulas in printed images, and then through fine-tuning (using LoRA technology), the model is enabled to recognize handwritten mathematical formulas. 4. **Compared performance with existing models**: The authors compared the proposed model with several similar models (such as Pix2Text, TexTeller, and Sumen) to evaluate its relative performance and robustness in handling handwritten mathematical formulas. 5. **Provided open-source resources**: The authors publicly released their code and pre-trained models to support further research and development in the OCR field, particularly for processing mathematical and scientific documents. In summary, the main goal of this paper is to propose an efficient and accurate image-to-LaTeX conversion method and to validate the effectiveness of the proposed method through experiments.

Image-to-LaTeX Converter for Mathematical Formulas and Text

Image to LaTeX with Graph Neural Network for Mathematical Formula Recognition

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge.

Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training

Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer

An End-to-End Formula Recognition Method Integrated Attention Mechanism

Image-to-Markup Generation with Coarse-to-Fine Attention

Neural Machine Translation for Mathematical Formulae

An Attention Based Image To Latex Markup Decoder

MathBridge: A Large Corpus Dataset for Translating Spoken Mathematical Expressions into $LaTeX$ Formulas for Improved Readability

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer

Developing a seq2seq neural network using visual attention to transform mathematical expressions from images to LaTeX.

ConvMath : A Convolutional Sequence Network for Mathematical Expression Recognition

Imbalanced Conditional Conv-Transformer for Mathematical Expression Recognition

MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition

Towards Formula Translation using Recursive Neural Networks

Latexify Math: Mathematical Formula Markup Revision to Assist Collaborative Editing in Math Q&A Sites

Handwritten Mathematical Expression Recognition via Attention Aggregation Based Bi-directional Mutual Learning

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition.

DGNet: A Handwritten Mathematical Formula Recognition Network Based on Deformable Convolution and Global Context Attention

TeXBLEU: Automatic Metric for Evaluate LaTeX Format