Abstract:Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 162 theorems previously unproved by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.

Generative Language Modeling for Automated Theorem Proving

NaturalProver: Grounded Mathematical Proof Generation with Language Models

ATG: Benchmarking Automated Theorem Generation for Generative Language Models

Proof Automation with Large Language Models

Experimental results from applying GPT-4 to an unpublished formal language

Baldur: Whole-Proof Generation and Repair with Large Language Models

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Large Language Models' Understanding of Math: Source Criticism and Extrapolation

Towards Large Language Models as Copilots for Theorem Proving in Lean

math-PVS: A Large Language Model Framework to Map Scientific Publications to PVS Theories

LeanAgent: Lifelong Learning for Formal Theorem Proving

Large Language Models for Mathematicians

Automated Theorem Provers Help Improve Large Language Model Reasoning

Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation

Leveraging Large Language Models for Automated Proof Synthesis in Rust

Automating the Generation of High School Geometry Proofs using Prolog in an Educational Context

FGeo-TP: A Language Model-Enhanced Solver for Geometry Problems

Examining the Emergence of Deductive Reasoning in Generative Language Models

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models

Tree-Based Representation and Generation of Natural and Mathematical Language