Abstract:Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 162 theorems previously unproved by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.

What problem does this paper attempt to address?

The problem this paper attempts to address is the limitations of existing theorem proving methods based on large language models (LLMs) in terms of generalization ability and adaptation to multi-domain mathematical knowledge. Specifically, existing methods are typically trained or fine-tuned on specific datasets to perform well in particular domains (such as undergraduate mathematics), but these methods perform poorly when dealing with advanced mathematics. Additionally, these methods fail to capture the dynamic nature of mathematicians often working across multiple fields and projects simultaneously or cyclically. To overcome these issues, the paper proposes a new lifelong learning framework—LeanAgent, for theorem proving. LeanAgent is capable of continuously generalizing and improving its mathematical knowledge without forgetting previously learned knowledge. The framework introduces several key innovations, including: 1. **Curriculum Learning Strategy**: Optimizes the learning path by gradually learning according to mathematical difficulty. 2. **Dynamic Database**: Efficiently manages the evolving mathematical knowledge. 3. **Progressive Training**: Balances stability and plasticity, allowing the model to retain old knowledge while learning new tasks. Through these innovations, LeanAgent successfully proved many theorems that had not been previously proven by humans, particularly in advanced mathematical fields such as abstract algebra and algebraic topology. The paper also demonstrates LeanAgent's superior performance in continuous generalization and improvement through multiple experiments and metric analyses.

LeanAgent: Lifelong Learning for Formal Theorem Proving

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Towards Large Language Models as Copilots for Theorem Proving in Lean

Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4

MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

ImProver: Agent-Based Automated Proof Optimization

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

LeanReasoner: Boosting Complex Logical Reasoning with Lean

InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems

Lean-STaR: Learning to Interleave Thinking and Proving

Lean Workbook: A large-scale Lean problem set formalized from natural language math problems

LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover

Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

LEMMA: Bootstrapping High-Level Mathematical Reasoning with Learned Symbolic Abstractions

LEGO-Prover: Neural Theorem Proving with Growing Libraries

SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

Learning Formal Mathematics From Intrinsic Motivation

A Lean Dataset for International Math Olympiad: Small Steps towards Writing Math Proofs for Hard Problems

Towards a Mathematics Formalisation Assistant using Large Language Models

Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback