Abstract:Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during the whole theorem proving process. However, as we all know, creating new useful theorems or even new theories is not only helpful but crucial and necessary for advancing mathematics and proving harder and deeper results. In this work, we present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. These skills are further evolved (by prompting an LLM) to enrich the library on another scale. Modular and reusable skills are constantly added to the library to enable tackling increasingly intricate mathematical problems. Moreover, the learned library further bridges the gap between human proofs and formal proofs by making it easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass rate on miniF2F-valid (48.0% to 57.0%) and miniF2F-test (45.5% to 47.1%). During the proving process, LEGO-Prover also manages to generate over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%. We also release our code and all the generated skills.

What problem does this paper attempt to address?

The paper primarily addresses the complex task of automated theorem proving in artificial intelligence, proposing a new method named LEGO-Prover. Traditionally, Large Language Models (LLMs) have made some progress in theorem proving but still struggle with proofs of medium difficulty or higher. Existing methods often assume that the theorem library is fixed during the proof process, which limits their problem-solving capabilities. LEGO-Prover overcomes these limitations by introducing a growing skill base that contains verified lemmas as "skills" to enhance the capabilities of LLMs in theorem proving. It adopts a modular proof strategy, allowing LLMs to utilize lemmas from the existing skill base and create new skills during the proof process. These new skills are further evolved (prompted by the LLM) to enrich the skill base, enabling it to tackle increasingly complex mathematical problems. This approach not only improves the success rate of proofs but also narrows the gap between manual and formal proofs, making it easier to fill in missing steps in proofs. Specifically, LEGO-Prover outperformed previous methods on the miniF2F dataset, increasing the effective pass rate from 48.0% to 57.0%, and from 45.5% to 50.0% on the test set. During the proof process, LEGO-Prover generated over 20,000 new skills (i.e., theorems or lemmas) and added them to the growing skill base. Experimental results show that these newly added skills indeed contribute to theorem proving, raising the success rate from 47.1% to 50.4%. Furthermore, the design inspiration for LEGO-Prover comes from the modular nature of LEGO bricks, which breaks down the proof process into independently provable sub-goal lemmas, and then uses these lemmas to complete the final proof. This method allows LLMs to construct proofs in a modular fashion, similar to assembling LEGO bricks, thereby improving the efficiency and success rate of proofs. In summary, LEGO-Prover significantly enhances the capability of automated theorem proving through a dynamic skill base and modular proof strategy, especially in its performance on complex mathematical problems.

LEGO-Prover: Neural Theorem Proving with Growing Libraries

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems

LeanAgent: Lifelong Learning for Formal Theorem Proving

Lean-STaR: Learning to Interleave Thinking and Proving

Proving Theorems Recursively

A Lean Dataset for International Math Olympiad: Small Steps towards Writing Math Proofs for Hard Problems

LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning

Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification

Library Learning Doesn't: The Curious Case of the Single-Use "Library"

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

BC-Prover: Backward Chaining Prover for Formal Theorem Proving

Towards Large Language Models as Copilots for Theorem Proving in Lean

Proof Automation with Large Language Models

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning

NaturalProver: Grounded Mathematical Proof Generation with Language Models

LeanReasoner: Boosting Complex Logical Reasoning with Lean

Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation