Divide-Verify-Refine: Aligning LLM Responses with Complex Instructions

Xianren Zhang,Xianfeng Tang,Hui Liu,Zongyu Wu,Qi He,Dongwon Lee,Suhang Wang
2024-10-16
Abstract:Recent studies show that LLMs, particularly open-source models, struggle to follow complex instructions with multiple constraints. Despite the importance, methods to improve LLMs' adherence to such constraints remain unexplored, and current research focuses on evaluating this ability rather than developing solutions. While a few studies enhance constraint adherence through model tuning, this approach is computationally expensive and heavily reliant on training data quality. An alternative is to leverage LLMs' self-correction capabilities, allowing them to adjust responses to better meet specified constraints. However, this self-correction ability of LLMs is limited by the feedback quality, as LLMs cannot autonomously generate reliable feedback or detect errors. Moreover, the self-refinement process heavily depends on few-shot examples that illustrate how to modify responses to meet constraints. As constraints in complex instructions are diverse and vary widely, manually crafting few-shot examples for each constraint type can be labor-intensive and sub-optimal. To deal with these two challenges, we propose the Divide-Verify-Refine (DVR) framework with three steps: (1) Divide complex instructions into single constraints and prepare appropriate tools; (2) Verify: To address the feedback quality problem, these tools will rigorously verify responses and provide reliable feedback; (3) Refine: To address the constraint diversity challenge, we design a refinement repository that collects successful refinement processes and uses them as few-shot demonstrations for future cases, allowing LLMs to learn from the past experience during inference. Additionally, we develop a new dataset of complex instructions, each containing 1-6 constraints. Experiments show that the framework significantly improves performance, doubling LLama3.1-8B's constraint adherence on instructions with 6 constraints.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper attempts to address the challenges that large - language models (LLMs) encounter when following complex instructions, especially when the instructions contain multiple constraints. Specifically, the paper mainly focuses on the following issues: 1. **Feedback Reliability Problem**: - The quality of feedback generated by LLMs during self - correction is low, resulting in unstable improvement effects and sometimes even performance degradation. - LLMs are unable to generate reliable feedback or detect errors independently, especially when dealing with multi - constraint instructions. 2. **Constraint Diversity Problem**: - Different types of constraints (such as text length, number of bullet points, inclusion of specific keywords, etc.) require different modification methods. - Manually creating few - shot examples for each constraint type is both time - consuming and inefficient. 3. **Limitations of Existing Datasets**: - Existing datasets lack complexity and internal consistency, leading to incomplete evaluations. - Most benchmark datasets only contain 1 - 2 constraints, while in actual application scenarios, more constraints may be involved. To solve these problems, the paper proposes a framework named Divide - Verify - Refine (DVR). The specific steps are as follows: 1. **Divide**: - Decompose complex instructions into individual constraint conditions and prepare corresponding tools. - For example, for an instruction requiring 4 key points, DVR will decompose it into the task of "checking the number of key points" and prepare corresponding tools (such as Python scripts). 2. **Verify**: - Use external tools to strictly verify whether the response meets each constraint condition and provide reliable feedback. - If the response does not meet the constraint condition, the tool will point out the specific error and suggest the direction of modification. For example, if 4 key points are required but only 2 are present, the tool will feedback "2 more key points need to be added". 3. **Refine**: - Use feedback and past successful refinement examples (from the refinement library) to adjust the response so that it meets all constraint conditions. - Successful refinement processes will be saved in the refinement library for future use. In addition, to ensure comprehensiveness of evaluation, the author also constructs a new complex - instruction dataset named ComplexInstruct, in which each instruction contains 1 - 6 constraint conditions. Experimental results show that the DVR framework significantly improves the ability of LLMs to follow complex instructions, especially when dealing with multiple constraint conditions. Through these methods, the DVR framework not only solves the problems of feedback reliability and constraint diversity but also provides a scalable and robust solution without the need for a large amount of retraining.