Abstract:Recent studies show that LLMs, particularly open-source models, struggle to follow complex instructions with multiple constraints. Despite the importance, methods to improve LLMs' adherence to such constraints remain unexplored, and current research focuses on evaluating this ability rather than developing solutions. While a few studies enhance constraint adherence through model tuning, this approach is computationally expensive and heavily reliant on training data quality. An alternative is to leverage LLMs' self-correction capabilities, allowing them to adjust responses to better meet specified constraints. However, this self-correction ability of LLMs is limited by the feedback quality, as LLMs cannot autonomously generate reliable feedback or detect errors. Moreover, the self-refinement process heavily depends on few-shot examples that illustrate how to modify responses to meet constraints. As constraints in complex instructions are diverse and vary widely, manually crafting few-shot examples for each constraint type can be labor-intensive and sub-optimal. To deal with these two challenges, we propose the Divide-Verify-Refine (DVR) framework with three steps: (1) Divide complex instructions into single constraints and prepare appropriate tools; (2) Verify: To address the feedback quality problem, these tools will rigorously verify responses and provide reliable feedback; (3) Refine: To address the constraint diversity challenge, we design a refinement repository that collects successful refinement processes and uses them as few-shot demonstrations for future cases, allowing LLMs to learn from the past experience during inference. Additionally, we develop a new dataset of complex instructions, each containing 1-6 constraints. Experiments show that the framework significantly improves performance, doubling LLama3.1-8B's constraint adherence on instructions with 6 constraints.

RuleR: Improving LLM Controllability by Rule-based Data Recycling

Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models

Failures Pave the Way: Enhancing Large Language Models Through Tuning-free Rule Accumulation

RNR: Teaching Large Language Models to Follow Roles and Rules

Distilling Task-specific Logical Rules from Large Pre-trained Models

Can LLMs Follow Simple Rules?

Rule-based Data Selection for Large Language Models

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

Reasoning Makes Good Annotators : an Automatic Task-specific Rules Distilling Framework for Low-resource Relation Extraction

Divide-Verify-Refine: Aligning LLM Responses with Complex Instructions

Learning from "Silly" Questions Improves Large Language Models, But Only Slightly

RuleRAG: Rule-guided retrieval-augmented generation with language models for question answering

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs

Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint

Automatic Adaptation Rule Optimization via Large Language Models

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints

STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models

Optimizing Large Language Models for Dynamic Constraints through Human-in-the-Loop Discriminators