CoqPilot, a plugin for LLM-based generation of proofs

Andrei Kozyrev,Gleb Solovev,Nikita Khramov,Anton Podkopaev
DOI: https://doi.org/10.1145/3691620.3695357
2024-10-25
Abstract:We present CoqPilot, a VS Code extension designed to help automate writing of Coq proofs. The plugin collects the parts of proofs marked with the admit tactic in a Coq file, i.e., proof holes, and combines LLMs along with non-machine-learning methods to generate proof candidates for the holes. Then, CoqPilot checks if each proof candidate solves the given subgoal and, if successful, replaces the hole with it. The focus of CoqPilot is twofold. Firstly, we want to allow users to seamlessly combine multiple Coq generation approaches and provide a zero-setup experience for our tool. Secondly, we want to deliver a platform for LLM-based experiments on Coq proof generation. We developed a benchmarking system for Coq generation methods, available in the plugin, and conducted an experiment using it, showcasing the framework's possibilities. Demo of CoqPilot is available at: <a class="link-external link-https" href="https://youtu.be/oB1Lx-So9Lo" rel="external noopener nofollow">this https URL</a>. Code at: <a class="link-external link-https" href="https://github.com/JetBrains-Research/coqpilot" rel="external noopener nofollow">this https URL</a>
Software Engineering,Artificial Intelligence,Logic in Computer Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to simplify and automate the process of writing Coq proofs. Specifically, the paper introduces a VS Code plug - in named CoqPilot, which aims to automatically generate the missing parts (i.e., "proof holes") in Coq proofs by combining large - language models (LLMs) and other non - machine - learning methods. The following are the main problems that this paper attempts to solve: 1. **Simplifying Coq Proof Writing**: - Coq is an interactive theorem - proving system, and writing formal proofs is a very time - consuming task that requires a great deal of experience. CoqPilot aims to simplify this process by automatically generating proof fragments. - The method proposed in the paper allows users to seamlessly combine multiple Coq - generation methods and provides a zero - configuration experience. 2. **Improving the Quality and Efficiency of Generated Proofs**: - By integrating multiple generation methods (including LLM - based methods and other traditional methods), CoqPilot can generate higher - quality proofs. - The plug - in will automatically check whether the generated proof candidates are valid and replace the original `admit` tag when successful. 3. **Providing an Experimental Platform**: - CoqPilot is not only a tool but also provides an experimental platform for evaluating the performance of different methods in Coq proof generation. The paper has developed a benchmarking framework for comparing the effects of different LLMs and other generation methods. 4. **Enhancing the Functionality of Existing Tools**: - CoqPilot can be used in combination with existing Coq automation tools (such as Tactician and CoqHammer) to further improve the performance of these tools. - Through premise selection and multi - round communication mechanisms, CoqPilot can make better use of context information to help LLMs generate more accurate proofs. ### Formula Representation The formulas involved in the description are as follows: - Suppose the number of generations for each proof candidate is \( n \), and the number of proof candidates generated each time is \( k \), then the total number of proof candidates generated is: \[ \text{Total Candidates} = n\times k \] - In multi - round communication, suppose the number of proof candidates generated in each round is \( m \), and at most \( d \) rounds are carried out, then the total number of generations is: \[ \text{Total Rounds} = \min(d, \text{number of failed attempts}) \] ### Summary CoqPilot significantly simplifies the process of writing Coq proofs, improves the quality and efficiency of generated proofs, and provides a powerful experimental platform for researchers by combining multiple generation methods and an automatic verification mechanism.