Abstract:Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length constraints. Such models are superior in length instructed evaluations, outperforming standard instruction following models such as GPT4, Llama 3 and Mixtral.

What problem does this paper attempt to address?

The paper attempts to address the issue of length bias in instruction-following models. Specifically, current evaluation methods tend to favor longer responses, and training algorithms also exploit this bias to generate longer responses. This leads to two main problems: 1. **Evaluation Bias**: Both humans and models tend to choose longer responses during evaluation, which introduces bias into the evaluation results. 2. **Training Bias**: Due to the bias in evaluation signals, the reward mechanism used during training is also affected, leading to the model generating excessively long responses. To solve these problems, the paper proposes a new method that incorporates length constraint instructions into the training data to ensure that the model can generate appropriately lengthened responses based on specific contexts. This method allows for better control over the length of the model's output and improves its performance on tasks that include length instructions. Additionally, the paper introduces how to construct new benchmark tests with length constraints (such as AlpacaEval-LI and MT-Bench-LI) and validates the effectiveness of the proposed method through experiments.

Following Length Constraints in Instructions