Offline Prompt Polishing for Low Quality Instructions

Jia Yu,Zhanchao Zhou,Long Li,Ling Li,Yuming Yan,Renjun Xu,Zhenzhong Lan
DOI: https://doi.org/10.1016/j.neucom.2024.128046
IF: 6
2024-01-01
Neurocomputing
Abstract:Instruction-tuning is an effective avenue for making large language models (LLMs) better at following real users' instructions. However, it is challenging in aligning to human preference in user scenario since the instructions model received are usually not well-formatted. In this paper, we introduce offline prompt polishing and inserting specific delimiters before inputting them to the models to cope with these bad instructions. To better understand the user behavior in proposing instructions and how language models align to them, we introduce User-based Instructional Dataset (UID), a dataset comprises over 96,000 instruction-response pairs which contains over 3k human-revised free-form instructions collected from real-world scenarios. Within UID, we kept both original and revised instructions to improve model robustness. We obtained various IOPTs checkpoints, a range of OPT models (125M to 13B) trained with UID, through offline prompt polishing and delimiter insertion. The results demonstrate that IOPT-2.7B trained on 6,000 instances can achieve comparable performance to a 175B InstructGPT. Besides, we rigorously measure the impact of various factors including data volume, model size, and instruction format on aligning to real users' instructions. We summarize several findings to shed a light on instruction-tuning under user scenario. Our dataset will be made public upon acceptance.
What problem does this paper attempt to address?