Abstract:While most research on controllable text generation has focused on steering base Language Models, the emerging instruction-tuning and prompting paradigm offers an alternate approach to controllability. We compile and release ConGenBench, a testbed of 17 different controllable generation tasks, using a subset of it to benchmark the performance of 9 different baselines and methods on Instruction-tuned Language Models. To our surprise, we find that prompting-based approaches outperform controllable text generation methods on most datasets and tasks, highlighting a need for research on controllable text generation with Instruction-tuned Language Models in specific. Prompt-based approaches match human performance on most stylistic tasks while lagging on structural tasks, foregrounding a need to study more varied constraints and more challenging stylistic tasks. To facilitate such research, we provide an algorithm that uses only a task dataset and a Large Language Model with in-context capabilities to automatically generate a constraint dataset. This method eliminates the fields dependence on pre-curated constraint datasets, hence vastly expanding the range of constraints that can be studied in the future.

What problem does this paper attempt to address?

The paper primarily attempts to address several key issues in the field of controllable text generation and explores the capabilities of different methods in controlling the output of large language models (LLMs) in the era of instruction tuning. Specifically: 1. **Investigate whether the currently common controllable text generation problems still pose challenges to instruction-tuned LLMs**: The paper focuses on how to improve controllability in the context of instruction tuning, particularly in tasks such as toxicity avoidance, sentiment control, and topic control. 2. **Evaluate whether methods that enhance the controllability of base LLMs are also applicable to instruction-tuned LLMs**: The paper studies the performance of these techniques on instruction-tuned models by comparing different baseline methods and controllable text generation methods. 3. **Compare controllable text generation methods with prompt-based methods**: The paper finds that, on most datasets and tasks, prompt-based methods outperform traditional controllable text generation methods and approach human performance in style control tasks, though there is still room for improvement in structural control tasks. To achieve these goals, the paper proposes a new algorithm for automatically generating constrained datasets using only task datasets and large-scale language models with contextual learning capabilities, thereby eliminating the reliance on precompiled constrained datasets and expanding the range of constraints that can be studied in the future. Additionally, the paper constructs a benchmark named ConGenBench, which includes 17 different controllable generation tasks to systematically evaluate the performance of various methods on instruction-tuned LLMs.

Controllable Text Generation in the Instruction-Tuning Era

Controllable Text Generation with Language Constraints

Controlled Text Generation with Natural Language Instructions

GenQA: Generating Millions of Instructions from a Handful of Prompts

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Controllable Text Generation for Open-Domain Creativity and Fairness

Context-dependent Instruction Tuning for Dialogue Response Generation

Controlled Text Generation using T5 based Encoder-Decoder Soft Prompt Tuning and Analysis of the Utility of Generated Text in AI

Controllable Navigation Instruction Generation with Chain of Thought Prompting

Controlled Text Generation as Continuous Optimization with Multiple Constraints

Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation

Controllable Text Generation Using Semantic Control Grammar

Toward Unified Controllable Text Generation via Regular Expression Instruction

TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design

Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints

Instruction Tuning with Human Curriculum

CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Generation