DoG-Instruct: Towards Premium Instruction-Tuning Data Via Text-Grounded Instruction Wrapping

Yongrui Chen,Haiyun Jiang,Xinting Huang,Shuming Shi,Guilin Qi
DOI: https://doi.org/10.18653/v1/2024.naacl-long.230
2024-01-01
Abstract:The improvement of LLMs' instruction-following capabilities relies heavily onthe availability of high-quality instruction-response pairs. Unfortunately, thecurrent methods used to collect the pairs suffer from either unaffordable laborcosts or severe hallucinations in the self-generation of LLM. To tackle thesechallenges, this paper proposes a scalable solution. It involves training LLMsto generate instruction-response pairs based on human-written documents, ratherthan relying solely on self-generation without context. Our proposed method notonly exploits the advantages of human-written documents in reducinghallucinations but also utilizes an LLM to wrap the expression of documents,which enables us to bridge the gap between various document styles and thestandard AI response. Experiments demonstrate that our method outperformsexisting typical methods on multiple benchmarks. In particular, compared to thebest-performing baseline, the LLM trained using our generated dataset exhibitsa 10% relative improvement in performance on AlpacaEval, despite utilizingonly 1/5 of its training data. Furthermore, a comprehensive manual evaluationvalidates the quality of the data we generated. Our trained wrapper is publiclyavailable at https://github.com/Bahuia/Dog-Instruct.
What problem does this paper attempt to address?