Abstract:We present CoqPilot, a VS Code extension designed to help automate writing of Coq proofs. The plugin collects the parts of proofs marked with the admit tactic in a Coq file, i.e., proof holes, and combines LLMs along with non-machine-learning methods to generate proof candidates for the holes. Then, CoqPilot checks if each proof candidate solves the given subgoal and, if successful, replaces the hole with it. The focus of CoqPilot is twofold. Firstly, we want to allow users to seamlessly combine multiple Coq generation approaches and provide a zero-setup experience for our tool. Secondly, we want to deliver a platform for LLM-based experiments on Coq proof generation. We developed a benchmarking system for Coq generation methods, available in the plugin, and conducted an experiment using it, showcasing the framework's possibilities. Demo of CoqPilot is available at: <a class="link-external link-https" href="https://youtu.be/oB1Lx-So9Lo" rel="external noopener nofollow">this https URL</a>. Code at: <a class="link-external link-https" href="https://github.com/JetBrains-Research/coqpilot" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to simplify and automate the process of writing Coq proofs. Specifically, the paper introduces a VS Code plug - in named CoqPilot, which aims to automatically generate the missing parts (i.e., "proof holes") in Coq proofs by combining large - language models (LLMs) and other non - machine - learning methods. The following are the main problems that this paper attempts to solve: 1. **Simplifying Coq Proof Writing**: - Coq is an interactive theorem - proving system, and writing formal proofs is a very time - consuming task that requires a great deal of experience. CoqPilot aims to simplify this process by automatically generating proof fragments. - The method proposed in the paper allows users to seamlessly combine multiple Coq - generation methods and provides a zero - configuration experience. 2. **Improving the Quality and Efficiency of Generated Proofs**: - By integrating multiple generation methods (including LLM - based methods and other traditional methods), CoqPilot can generate higher - quality proofs. - The plug - in will automatically check whether the generated proof candidates are valid and replace the original `admit` tag when successful. 3. **Providing an Experimental Platform**: - CoqPilot is not only a tool but also provides an experimental platform for evaluating the performance of different methods in Coq proof generation. The paper has developed a benchmarking framework for comparing the effects of different LLMs and other generation methods. 4. **Enhancing the Functionality of Existing Tools**: - CoqPilot can be used in combination with existing Coq automation tools (such as Tactician and CoqHammer) to further improve the performance of these tools. - Through premise selection and multi - round communication mechanisms, CoqPilot can make better use of context information to help LLMs generate more accurate proofs. ### Formula Representation The formulas involved in the description are as follows: - Suppose the number of generations for each proof candidate is \( n \), and the number of proof candidates generated each time is \( k \), then the total number of proof candidates generated is: \[ \text{Total Candidates} = n\times k \] - In multi - round communication, suppose the number of proof candidates generated in each round is \( m \), and at most \( d \) rounds are carried out, then the total number of generations is: \[ \text{Total Rounds} = \min(d, \text{number of failed attempts}) \] ### Summary CoqPilot significantly simplifies the process of writing Coq proofs, improves the quality and efficiency of generated proofs, and provides a powerful experimental platform for researchers by combining multiple generation methods and an automatic verification mechanism.

CoqPilot, a plugin for LLM-based generation of proofs

CoqPyt: Proof Navigation in Python in the Era of LLMs

Cobblestone: Iterative Automation for Formal Verification

The Coq Proof Script Visualiser (coq-psv)

Towards Large Language Models as Copilots for Theorem Proving in Lean

SMLtoCoq: Automated Generation of Coq Specifications and Proof Obligations from SML Programs with Contracts

jsCoq: Towards Hybrid Theorem Proving Interfaces

Practices and Challenges of Using GitHub Copilot: An Empirical Study

Proust: A Nano Proof Assistant

GitHub Copilot: the perfect Code compLeeter?

Adaptive Test Generation Using a Large Language Model

Reliably Reproducing Machine-Checked Proofs with the Coq Platform

Demystifying Practices, Challenges and Expected Features of Using GitHub Copilot

Methodology for Code Synthesis Evaluation of LLMs Presented by a Case Study of ChatGPT and Copilot

SPICEPilot: Navigating SPICE Code Generation and Simulation with AI Guidance

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language

Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

A Coq Library for Program Calculation

QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning

Towards a Scalable Proof Engine: A Performant Prototype Rewriting Primitive for Coq