Abstract:Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of Vision Large Language Models (VLLMs). However, existing visual instruction tuning datasets include the following limitations: (1) Instruction annotation quality: despite existing VLLMs exhibiting strong performance, instructions generated by those advanced VLLMs may still suffer from inaccuracies, such as hallucinations. (2) Instructions and image diversity: the limited range of instruction types and the lack of diversity in image data may impact the model's ability to generate diversified and closer to real-world scenarios outputs. To address these challenges, we construct a high-quality, diverse visual instruction tuning dataset MMInstruct, which consists of 973K instructions from 24 domains. There are four instruction types: Judgement, Multiple-Choice, Long Visual Question Answering and Short Visual Question Answering. To construct MMInstruct, we propose an instruction generation data engine that leverages GPT-4V, GPT-3.5, and manual correction. Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at 1/6 the cost of manual construction. Through extensive experiment validation and ablation experiments, we demonstrate that MMInstruct could significantly improve the performance of VLLMs, e.g., the model fine-tuning on MMInstruct achieves new state-of-the-art performance on 10 out of 12 benchmarks. The code and data shall be available at <a class="link-external link-https" href="https://github.com/yuecao0119/MMInstruct" rel="external noopener nofollow">this https URL</a>.

Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation

Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace

Harnessing the Power of David against Goliath: Exploring Instruction Data Generation without Using Closed-Source Models

Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning

Instruction Tuning with Human Curriculum

Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design

Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search

Maybe Only 0.5 Training Data Instruction Tuning

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

InstructCoder: Instruction Tuning Large Language Models for Code Editing

IterSelectTune: An Iterative Training Framework for Efficient Instruction-Tuning Data Selection

Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Instruction Mining: Instruction Data Selection for Tuning Large Language Models

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

Instruction Tuning for Large Language Models: A Survey

LongForm: Effective Instruction Tuning with Reverse Instructions