ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation

Zihao Tang,Zheqi Lv,Shengyu Zhang,Fei Wu,Kun Kuang
2024-02-18
Abstract:The rapid advancement of Large Language Models (LLMs) has revolutionized various sectors by automating routine tasks, marking a step toward the realization of Artificial General Intelligence (AGI). However, they still struggle to accommodate the diverse and specific needs of users and simplify the utilization of AI models for the average user. In response, we propose ModelGPT, a novel framework designed to determine and generate AI models specifically tailored to the data or task descriptions provided by the user, leveraging the capabilities of LLMs. Given user requirements, ModelGPT is able to provide tailored models at most 270x faster than the previous paradigms (e.g. all-parameter or LoRA finetuning). Comprehensive experiments on NLP, CV, and Tabular datasets attest to the effectiveness of our framework in making AI models more accessible and user-friendly. Our code is available at https://github.com/IshiKura-a/ModelGPT.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper proposes a framework called ModelGPT that aims to address the efficiency and convenience issues of large language models (LLMs) in meeting diverse user needs and specific requirements. While LLMs perform well in automated tasks, they face challenges in independent deployment, resource consumption, and optimization for specific domains. ModelGPT leverages the capability of LLMs to understand user requirements and generates customized small-scale models to adapt to different data or task descriptions. Compared to previous fine-tuning methods such as full parameter adjustment or LoRA, ModelGPT can provide customized models at speeds of up to 270 times faster while maintaining comparable performance. Experiments show that ModelGPT demonstrates efficiency and usability in tasks related to natural language processing, computer vision, and tabular data. The paper also explores how to achieve general artificial intelligence by combining the strengths of large-scale models and small-scale models.