SelectIT: Selective Instruction Tuning for Large Language Models Via Uncertainty-Aware Self-Reflection

Liangxin Liu,Xuebo Liu,Derek F. Wong,Dongfang Li,Ziyi Wang,Baotian Hu,Min Zhang
DOI: https://doi.org/10.48550/arxiv.2402.16705
2024-01-01
Abstract:Instruction tuning (IT) is crucial to tailoring large language models (LLMs)towards human-centric interactions. Recent advancements have shown that thecareful selection of a small, high-quality subset of IT data can significantlyenhance the performance of LLMs. Despite this, common approaches often rely onadditional models or data sets, which increases costs and limits widespreadadoption. In this work, we propose a novel approach, termed SelectIT, thatcapitalizes on the foundational capabilities of the LLM itself. Specifically,we exploit the intrinsic uncertainty present in LLMs to more effectively selecthigh-quality IT data, without the need for extra resources. Furthermore, weintroduce a novel IT dataset, the Selective Alpaca, created by applyingSelectIT to the Alpaca-GPT4 dataset. Empirical results demonstrate that ITusing Selective Alpaca leads to substantial model ability enhancement. Therobustness of SelectIT has also been corroborated in various foundation modelsand domain-specific tasks. Our findings suggest that longer and morecomputationally intensive IT data may serve as superior sources of IT, offeringvaluable insights for future research in this area. Data, code, and scripts arefreely available at https://github.com/Blue-Raincoat/SelectIT.
What problem does this paper attempt to address?