ImProver: Agent-Based Automated Proof Optimization

Riyaz Ahuja,Jeremy Avigad,Prasad Tetali,Sean Welleck

2024-10-07

Abstract:Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter, more modular, and more readable.

Artificial Intelligence,Computation and Language,Machine Learning,Logic in Computer Science

What problem does this paper attempt to address?

The paper aims to address the problem of automated proof optimization, specifically by rewriting formal proofs of mathematical theorems to optimize specific criteria such as proof length or readability. The paper introduces the ImProver system, an agent based on large language models (LLM) that can rewrite proofs in the formal proof assistant tool Lean to optimize any user-defined metrics. The study found that directly applying large language models for proof optimization was ineffective, so the paper proposes a series of improvements, including: 1. **Chain-of-States Prompting**: Utilizing Lean's metaprogramming techniques to annotate intermediate states before each step, helping the model better understand intermediate goals and assumptions in the proof process. 2. **Output Format Adjustment**: Introducing different output formats, such as lists and tree structures, to generate more structured proofs. 3. **Sampling Method Improvement**: Introducing various sampling methods, such as best-of-n sampling and iterative improvement, to enhance the quality of generated proofs. 4. **Retrieval Enhancement**: Using Maximum Marginal Relevance (MMR) retrieval techniques to extract relevant information from existing databases for generating more accurate prompts. Experiments conducted on actual undergraduate theorems, competition problems, and research-level mathematical theorems showed that ImProver significantly outperformed the baseline GPT-4 model in optimizing proof length and readability. Additionally, the paper conducted ablation tests to verify the effectiveness of each component and derived the best parameter combinations. Overall, ImProver effectively optimizes mathematical proofs of various difficulty levels.

ImProver: Agent-Based Automated Proof Optimization

LeanAgent: Lifelong Learning for Formal Theorem Proving

Towards Large Language Models as Copilots for Theorem Proving in Lean

Proof Automation with Large Language Models

LeanReasoner: Boosting Complex Logical Reasoning with Lean

A Lean Dataset for International Math Olympiad: Small Steps towards Writing Math Proofs for Hard Problems

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems

Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning

Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning

Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4

NaturalProver: Grounded Mathematical Proof Generation with Language Models

Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

Lean-STaR: Learning to Interleave Thinking and Proving

Large Language Models as Optimizers

LangProp: A code optimization framework using Large Language Models applied to driving

Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding

Leveraging Large Language Models for Automated Proof Synthesis in Rust