Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

Shijian Deng,Wentian Zhao,Yu-Jhe Li,Kun Wan,Daniel Miranda,Ajinkya Kale,Yapeng Tian

2024-11-26

Abstract:Self-improvement in multimodal large language models (MLLMs) is crucial for enhancing their reliability and robustness. However, current methods often rely heavily on MLLMs themselves as judges, leading to high computational costs and potential pitfalls like reward hacking and model collapse. This paper introduces a novel, model-level judge-free self-improvement framework. Our approach employs a controlled feedback mechanism while eliminating the need for MLLMs in the verification loop. We generate preference learning pairs using a controllable hallucination mechanism and optimize data quality by leveraging lightweight, contrastive language-image encoders to evaluate and reverse pairs when necessary. Evaluations across public benchmarks and our newly introduced IC dataset designed to challenge hallucination control demonstrate that our model outperforms conventional techniques. We achieve superior precision and recall with significantly lower computational demands. This method offers an efficient pathway to scalable self-improvement in MLLMs, balancing performance gains with reduced resource requirements.

Computation and Language,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

This paper attempts to solve several key problems faced by multimodal large language models (MLLMs) in the self - improvement process: 1. **High computational cost**: Existing self - improvement methods usually rely on MLLMs themselves as evaluators, which leads to high computational costs. 2. **Potential risks**: Using MLLMs as evaluators may cause some problems, such as reward hacking and model collapse. 3. **Efficiency of data generation and verification**: Traditional self - improvement methods have the problem of low efficiency in the data generation and verification process, especially when a large number of samples are generated but only a small part is used. To overcome these problems, the paper proposes a new self - improvement framework without model - level evaluation. Specifically, this framework achieves efficient and effective self - improvement through the following steps: - **Controllable hallucination mechanism**: By adjusting the hallucination ratio, positive and negative sample pairs are generated for preference learning. - **Light - weight evaluation**: The contrastive language - image encoder is used to evaluate and reverse sample pairs when necessary to optimize data quality. - **Direct preference optimization (DPO)**: The seed model is trained with the optimized data set to improve its performance. Through these methods, the paper is evaluated on multiple public benchmarks and the newly introduced IC data set. The results show that this method achieves higher precision and recall while significantly reducing computational requirements. This method provides an efficient path for the scalable self - improvement of MLLMs and balances the relationship between performance improvement and resource requirements.

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks

Language Model Self-improvement by Reinforcement Learning Contemplation

SELF: Self-Evolution with Language Feedback

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria

Self-Improvement in Language Models: The Sharpening Mechanism

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Automated Multi-level Preference for MLLMs

Debiasing Multimodal Large Language Models

Self-Improving Teacher Cultivates Better Student: Distillation Calibration for Multimodal Large Language Models

Understanding the Role of LLMs in Multimodal Evaluation Benchmarks

Self-Judge: Selective Instruction Following with Alignment Self-Evaluation

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization