Deploying Multi-task Online Server with Large Language Model

Yincen Qu,Chao Ma,Yiting Wu,Xiangying Dai,Hui Zhou,Hengyue Liu
2024-11-06
Abstract:In the industry, numerous tasks are deployed online. Traditional approaches often tackle each task separately by its own network, which leads to excessive costs for developing and scaling models, especially in the context of large language models. Although multi-task methods can save costs through parameter sharing, they often struggle to outperform single-task methods in real-world applications. To tackle these challenges, we present a three-stage multi-task learning framework for large language models. It involves task filtering, followed by fine-tuning on high-resource tasks, and finally fine-tuning on all tasks. We conducted comprehensive experiments in single-task and multi-task settings. Our approach, exemplified on different benchmarks, demonstrates that it is able to achieve performance comparable to the single-task method while reducing up to 90.9\% of its overhead.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the cost and performance challenges encountered when deploying multi - task online services in industrial applications. Specifically: 1. **Cost problem**: As the number of online tasks increases, so does the resource requirement. If each task is processed using its own network and pipeline separately, it will lead to excessive development and maintenance workloads, while also increasing latency and resource usage. Especially in the context of large - scale language models (LLMs), each task needs to expand the model, which will bring huge costs. 2. **Performance problem**: Although the multi - task method can save costs through parameter sharing, in practical applications, the multi - task method often has difficulty surpassing the performance of the single - task method. The main reasons are the negative transfer and over - fitting problems caused by data imbalance and task heterogeneity. To solve these problems, the author proposes a three - stage multi - task learning framework. The specific steps are as follows: 1. **Task filtering**: Avoid negative transfer by filtering out dissimilar tasks. 2. **Fine - tuning high - resource tasks**: First, fine - tune on high - resource tasks to balance the training steps of different tasks. 3. **Fine - tuning mixed tasks**: Finally, fine - tune on the mixed data of all tasks so that the model can learn from multiple tasks. Through this framework, the author hopes to significantly reduce resource overhead while maintaining performance comparable to that of the single - task method. Experimental results show that this method performs well in multiple benchmark tests and can reduce overhead by up to 90.9%.