Shomikoron: Dataset to discover equations from Bangla Mathematical text

Tanjim Taharat Aurpa,Kazi Noshin Fariha,Kawser Hossain
DOI: https://doi.org/10.1016/j.dib.2024.110742
2024-07-17
Abstract:Equation Recognition is a mathematical task of identifying equations, which has significance in developing different mathematical systems. In this paper, we introduce a novel Bangla mathematical equation dataset comprising 3430 observations aimed at advancing mathematical Equation Recognition in the Bangla language. To the best of our knowledge, no such dataset exists that was developed to recognize equations from the text. Each entry in the dataset includes a mathematical statement and the corresponding equation. This resource can significantly support research in mathematical Equation Recognition, including the identification of common mathematical operations (such as addition, subtraction, multiplication, division, and roots) and numerical values. With minor adjustments, researchers can also explore combinations of these findings. The dataset is raw and conveniently structured in CSV format, with two columns: "Text" and "Equation," facilitating easy handling for various deep learning and machine learning tasks.
What problem does this paper attempt to address?