Abstract:Since the introduction of the original BERT (i.e., BASE BERT), researchers have developed various customized BERT models with improved performance for specific domains and tasks by exploiting the benefits of transfer learning. Due to the nature of mathematical texts, which often use domain specific vocabulary along with equations and math symbols, we posit that the development of a new BERT model for mathematics would be useful for many mathematical downstream tasks. In this resource paper, we introduce our multi-institutional effort (i.e., two learning platforms and three academic institutions in the US) toward this need: MathBERT, a model created by pre-training the BASE BERT model on a large mathematical corpus ranging from pre-kindergarten (pre-k), to high-school, to college graduate level mathematical content. In addition, we select three general NLP tasks that are often used in mathematics education: prediction of knowledge component, auto-grading open-ended Q&A, and knowledge tracing, to demonstrate the superiority of MathBERT over BASE BERT. Our experiments show that MathBERT outperforms prior best methods by 1.2-22% and BASE BERT by 2-8% on these tasks. In addition, we build a mathematics specific vocabulary 'mathVocab' to train with MathBERT. We discover that MathBERT pre-trained with 'mathVocab' outperforms MathBERT trained with the BASE BERT vocabulary (i.e., 'origVocab'). MathBERT is currently being adopted at the participated leaning platforms: Stride, Inc, a commercial educational resource provider, and ASSISTments.org, a free online educational platform. We release MathBERT for public usage at: https://github.com/tbs17/MathBERT.

Fine-Tuning BERTs for Definition Extraction from Mathematical Text

Extracting Definienda in Mathematical Scholarly Articles with Transformers

Fine-tune BERT with Sparse Self-Attention Mechanism.

Fine-Tuning Large Language Models for Scientific Text Classification: A Comparative Study

UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction

Can Fine-tuning Pre-trained Models Lead to Perfect NLP? A Study of the Generalizability of Relation Extraction.

Automated Discovery of Mathematical Definitions in Text with Deep Neural Networks

How to Fine-Tune BERT for Text Classification?

Math Function Recognition with Fine-Tuning Pre-Trained Models

Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge

On Robustness and Bias Analysis of BERT-Based Relation Extraction

Empirical Study of LLM Fine-Tuning for Text Classification in Legal Document Review

MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education

A Closer Look at How Fine-tuning Changes BERT

AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning

Extracting Mathematical Concepts from Text

Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning

Single task fine-tune BERT for text classification

RoBERTa-wwm-ext Fine-Tuning for Chinese Text Classification

MathBERT: A Pre-Trained Model for Mathematical Formula Understanding

BERTer: The Efficient One