GFLean: An Autoformalisation Framework for Lean via GF

Shashank Pathak
2024-04-02
Abstract:We present an autoformalisation framework for the Lean theorem prover, called GFLean. GFLean uses a high-level grammar writing tool called Grammatical Framework (GF) for parsing and linearisation. GFLean is implemented in Haskell. We explain the functionalities of GFLean, its inner working and discuss its limitations. We also discuss how we can use neural network based translation programs and rule based translation programs together complimenting each other to build robust autoformalisation frameworks.
Computation and Language,Logic
What problem does this paper attempt to address?
The paper proposes an automatic formalization framework called GFLean for converting mathematical texts from natural language to the input of the Lean proof assistant. Automatic formalization involves formalizing texts in logical systems, including proof checking and computer storage and manipulation. GFLean utilizes a high-level grammar writing tool called Grammar Formalism (GF) for parsing and linearizing, and is implemented in Haskell. GFLean focuses on handling "formal patterns" sentences in mathematical language, which are sentences with strictly mathematical content that can be formalized, and does not handle "informal patterns" introductory and comment sentences. It can currently only handle statements and not proofs. The input and output examples of GFLean demonstrate its ability to handle natural language expressions such as basic arithmetic operations, quantifiers, and negation. The paper also discusses the limitations of GFLean, including limited vocabulary, inability to handle complex language constructions, and the runtime inability to expand vocabulary due to the static nature of GF. The possibility of building a more robust automatic formalization framework is proposed by combining neural networks and rule-based translation programs. In addition, the workflow of GFLean includes parsing Simplified ForTheL expressions, simplifying abstract syntax trees, converting to Lean expressions, and linearizing into Lean input. It is compared to previous works such as SAD, Naproche, and MathNat projects, highlighting the simplifications and limitations of GFLean in handling mathematical language. In summary, this paper aims to address the automatic formalization of mathematical texts into the language of the Lean proof assistant for verification and computer processing, while discussing the challenges of the current implementation and possible future improvements.