Deploying Multi-task Online Server with Large Language Model

Yincen Qu,Chao Ma,Yiting Wu,Xiangying Dai,Hui Zhou,Hengyue Liu

2024-11-06

Abstract:In the industry, numerous tasks are deployed online. Traditional approaches often tackle each task separately by its own network, which leads to excessive costs for developing and scaling models, especially in the context of large language models. Although multi-task methods can save costs through parameter sharing, they often struggle to outperform single-task methods in real-world applications. To tackle these challenges, we present a three-stage multi-task learning framework for large language models. It involves task filtering, followed by fine-tuning on high-resource tasks, and finally fine-tuning on all tasks. We conducted comprehensive experiments in single-task and multi-task settings. Our approach, exemplified on different benchmarks, demonstrates that it is able to achieve performance comparable to the single-task method while reducing up to 90.9\% of its overhead.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the cost and performance challenges encountered when deploying multi - task online services in industrial applications. Specifically: 1. **Cost problem**: As the number of online tasks increases, so does the resource requirement. If each task is processed using its own network and pipeline separately, it will lead to excessive development and maintenance workloads, while also increasing latency and resource usage. Especially in the context of large - scale language models (LLMs), each task needs to expand the model, which will bring huge costs. 2. **Performance problem**: Although the multi - task method can save costs through parameter sharing, in practical applications, the multi - task method often has difficulty surpassing the performance of the single - task method. The main reasons are the negative transfer and over - fitting problems caused by data imbalance and task heterogeneity. To solve these problems, the author proposes a three - stage multi - task learning framework. The specific steps are as follows: 1. **Task filtering**: Avoid negative transfer by filtering out dissimilar tasks. 2. **Fine - tuning high - resource tasks**: First, fine - tune on high - resource tasks to balance the training steps of different tasks. 3. **Fine - tuning mixed tasks**: Finally, fine - tune on the mixed data of all tasks so that the model can learn from multiple tasks. Through this framework, the author hopes to significantly reduce resource overhead while maintaining performance comparable to that of the single - task method. Experimental results show that this method performs well in multiple benchmark tests and can reduce overhead by up to 90.9%.

Deploying Multi-task Online Server with Large Language Model

Optimizing Multi-Task Learning for Enhanced Performance in Large Language Models

Enhancing Subtask Performance of Multi-modal Large Language Model

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Large Language Model as a Universal Clinical Multi-task Decoder

Cross-model Control: Improving Multiple Large Language Models in One-time Training

12-in-1: Multi-Task Vision and Language Representation Learning

Enhancing Robot Task Planning and Execution through Multi-Layer Large Language Models

An Efficient 2D Method for Training Super-Large Deep Learning Models

Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems

Efficient and Economic Large Language Model Inference with Attention Offloading

Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer

A Framework to Implement 1+N Multi-task Fine-tuning Pattern in LLMs Using the CGC-LORA Algorithm

Online Parallel Multi-Task Relationship Learning via Alternating Direction Method of Multipliers

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

Multilevel Large Language Models for Everyone

Distributed Training of Large Language Models

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

An End-to-End Scalable Iterative Sequence Tagging with Multi-Task Learning.