Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency

Baizhou Huang,Shuai Lu,Weizhu Chen,Xiaojun Wan,Nan Duan

2024-07-02

Abstract:Large language models (LLMs) have exhibited remarkable ability in code generation. However, generating the correct solution in a single attempt still remains a challenge. Prior works utilize verification properties in software engineering to verify and re-rank solutions in a majority voting manner. But the assumption behind them that generated verification properties have better qualities than solutions may not always hold. In this paper, we treat them equally as different perspectives of LLMs' reasoning processes. We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency across outputs from multiple perspectives. Specifically, we prompt LLMs to generate diverse outputs from three perspectives, Solution, Specification and Test case, constructing a 3-partite graph. With two measure functions of consistency, we embed both inter- and intra-consistency information into the graph. The optimal choice of solutions is then determined based on analysis in the graph. MPSC significantly boosts performance of foundation models (ChatGPT in this paper) on various benchmarks, including HumanEval (+15.91%), MBPP (+6.43%) and CodeContests (+9.37%), even surpassing GPT-4.

Computation and Language,Artificial Intelligence,Software Engineering

What problem does this paper attempt to address?

The problem this paper attempts to address is: Although large language models (LLMs) perform well in code generation tasks, generating the correct solution in a single attempt remains challenging. Existing methods introduce verification attributes from software engineering to validate and reorder generated solutions, but these methods assume that the generated verification attributes are of higher quality than the generated solutions, which is not always the case. To overcome this issue, the paper proposes the Multi-Perspective Self-Consistency (MPSC) framework. This framework treats solutions, specifications, and test cases as outputs from different perspectives and integrates internal consistency and cross-perspective consistency information by constructing a 3-partite graph. Ultimately, the optimal solution is selected based on the analysis of the graph. Specifically, the main contributions of the MPSC framework include: 1. **Multi-Perspective Generation**: Generating diverse outputs from three different perspectives (solutions, specifications, and test cases). 2. **Consistency Measurement**: Evaluating internal consistency and cross-perspective consistency through two metric functions. 3. **Graph Construction and Optimization**: Representing the generated outputs as vertices of a graph and connecting vertices from different perspectives with edges to construct a 3-partite graph. Then, selecting the most consistent solution by optimizing an objective function. Experimental results show that the MPSC framework significantly improves the performance of base models (such as ChatGPT) on multiple code generation benchmarks, even surpassing GPT-4.

Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency

Multi-Perspective Consistency Enhances Confidence Estimation in Large Language Models

Multi-Programming Language Ensemble for Code Generation in Large Language Model

LLM2: Let Large Language Models Harness System 2 Reasoning

Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate

Universal Self-Consistency for Large Language Model Generation

Multi-Model Consistency for LLMs’ Evaluation

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

Large Language Models Can Self-Improve in Long-context Reasoning

Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning

Large Language Models Are Better Reasoners with Self-Verification

Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?

Large Language Models as Test Case Generators: Performance Evaluation and Enhancement

Improving Natural Language Capability of Code Large Language Model

Supervised Knowledge Makes Large Language Models Better In-context Learners

Enhancing Large Language Models' Situated Faithfulness to External Contexts

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?