Abstract:Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental question: Can we develop a unified approach to represent and handle different multi-modal tasks to maximize the generalizability of MLLMs? In this paper, we propose UnifiedMLLM, a comprehensive model designed to represent various tasks using a unified representation. Our model exhibits strong capabilities in comprehending the implicit intent of user instructions and preforming reasoning. In addition to generating textual responses, our model also outputs task tokens and grounding tokens, serving as indicators of task types and task granularity. These outputs are subsequently routed through the task router and directed to specific expert models for task completion. To train our model, we construct a task-specific dataset and an 100k multi-task dataset encompassing complex scenarios. Employing a three-stage training strategy, we equip our model with robust reasoning and task processing capabilities while preserving its generalization capacity and knowledge reservoir. Extensive experiments showcase the impressive performance of our unified representation approach across various tasks, surpassing existing methodologies. Furthermore, our approach exhibits exceptional scalability and generality. Our code, model, and dataset will be available at \url{<a class="link-external link-https" href="https://github.com/lzw-lzw/UnifiedMLLM" rel="external noopener nofollow">this https URL</a>}.

MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents

UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model

Large Language Models Synergize with Automated Machine Learning

ControlLLM: Augment Language Models with Tools by Searching on Graphs

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

Enabling Large Language Models to Perform Power System Simulations with Previously Unseen Tools: A Case of Daline

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

Can Language Models Pretend Solvers? Logic Code Simulation with LLMs

LLM4RL: Enhancing Reinforcement Learning with Large Language Models

Towards Large Language Models as Copilots for Theorem Proving in Lean

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

Chain of Tools: Large Language Model is an Automatic Multi-tool Learner

Control Industrial Automation System with Large Language Models

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework