Grammar-obeying Program Synthesis: A Novel Approach Using Large Language Models and Many-Objective Genetic Programming

Ning Tao,Anthony Ventresque,Vivek Nallur,Takfarinas Saber
DOI: https://doi.org/10.1016/j.csi.2024.103938
IF: 3.721
2025-01-01
Computer Standards & Interfaces
Abstract:Program synthesis is an important challenge that has attracted significant research interest, especially in recent years with advancements in Large Language Models (LLMs). Although LLMs have demonstrated success in program synthesis, there remains a lack of trust in the generated code due to documented risks (e.g., code with known and risky vulnerabilities). Therefore, it is important to restrict the search space and avoid bad programs. In this work, pre-defined restricted Backus–Naur Form (BNF) grammars are utilised, which are considered ‘safe’, and the focus is on identifying the most effective technique for grammar-obeying program synthesis, where the generated code must be correct and conform to the predefined grammar. It is shown that while LLMs perform well in generating correct programs, they often fail to produce code that adheres to the grammar. To address this, a novel Similarity-Based Many-Objective Grammar Guided Genetic Programming (SBMaOG3P) approach is proposed, leveraging the programs generated by LLMs in two ways: (i) as seeds following a grammar mapping process and (ii) as targets for similarity measure objectives. Experiments on a well-known and widely used program synthesis dataset indicate that the proposed approach successfully improves the rate of grammar-obeying program synthesis compared to various LLMs and the state-of-the-art Grammar-Guided Genetic Programming. Additionally, the proposed approach significantly improved the solution in terms of the best fitness value of each run for 21 out of 28 problems compared to G3P.
What problem does this paper attempt to address?