NetLLM: Adapting Large Language Models for Networking

Duo Wu,Xianda Wang,Yaqi Qiao,Zhi Wang,Junchen Jiang,Shuguang Cui,Fangxin Wang
DOI: https://doi.org/10.1145/3651890.3672268
2024-08-06
Abstract:Many networking tasks now employ deep learning (DL) to solve complex prediction and optimization problems. However, current design philosophy of DL-based algorithms entails intensive engineering overhead due to the manual design of deep neural networks (DNNs) for different networking tasks. Besides, DNNs tend to achieve poor generalization performance on unseen data distributions/environments. Motivated by the recent success of large language models (LLMs), this work studies the LLM adaptation for networking to explore a more sustainable design philosophy. With the powerful pre-trained knowledge, the LLM is promising to serve as the foundation model to achieve "one model for all tasks" with even better performance and stronger generalization. In pursuit of this vision, we present NetLLM, the first framework that provides a coherent design to harness the powerful capabilities of LLMs with low efforts to solve networking problems. Specifically, NetLLM empowers the LLM to effectively process multimodal data in networking and efficiently generate task-specific answers. Besides, NetLLM drastically reduces the costs of fine-tuning the LLM to acquire domain knowledge for networking. Across three networking-related use cases - viewport prediction, adaptive bitrate streaming and cluster job scheduling, we showcase that the NetLLM-adapted LLM significantly outperforms state-of-the-art algorithms.
Networking and Internet Architecture,Machine Learning
What problem does this paper attempt to address?
### The Problem This Paper Attempts to Solve This paper aims to address two key issues present in current deep learning (DL)-based network tasks: 1. **High Model Engineering Cost**: - The current philosophy of designing DL algorithms leads to significant engineering overhead because it requires manually designing deep neural networks (DNNs) for different network tasks. The complex structures of these DNNs make it difficult and time-consuming to design, implement, and verify control rules. - Different network tasks cannot share the same DNN model, so a specialized DNN needs to be designed for each task (i.e., one model per task), further increasing engineering costs. 2. **Low Generalization Ability**: - DNNs trained on specific data distributions or environments may perform poorly on unseen data distributions or environments, sometimes even worse than traditional rule-based algorithms. - For example, an adaptive bitrate (ABR) model trained under stable network conditions may perform poorly in a network environment with significant bandwidth fluctuations. This lack of generalization ultimately hinders the widespread application of learning-based algorithms in practice, as network operators may doubt the superiority of these algorithms in production environments. To address these issues, the paper proposes a new framework, NetLLM, which adapts large language models (LLMs) for network tasks. By leveraging the powerful pre-trained knowledge of LLMs, NetLLM aims to achieve "one model for all tasks" with better performance and stronger generalization ability. Specifically, NetLLM addresses the aforementioned issues through the following innovations: - **Multimodal Encoder**: An efficient multimodal encoder is designed to enable LLMs to handle multimodal input information in network tasks. - **Network Head Modules**: Various network head modules are introduced to directly generate task-specific answers, eliminating the need for word-by-word prediction and improving the efficiency of answer generation. - **Data-Driven Low-Rank Network Adaptation (DD-LRNA) Scheme**: An efficient data-driven adaptation scheme is designed to significantly reduce the cost of fine-tuning LLMs to acquire domain knowledge. Through these innovations, NetLLM not only significantly outperforms existing algorithms in three network-related use cases (viewport prediction, adaptive bitrate streaming, and cluster job scheduling) but also demonstrates strong generalization ability in unseen test environments.