LLMaaS: Serving Large Language Models on Trusted Serverless Computing Platforms

Zinuo Cai,Rongbo Ma,Yicheng Fu,Weishan Zhang,Ruhui Ma,Haibing Guan
DOI: https://doi.org/10.1109/tai.2024.3429480
2024-01-01
IEEE Transactions on Artificial Intelligence
Abstract:In recent years, the emergence of large language models has profoundly transformed our production and lifestyle. These models have shown tremendous potential in fields such as natural language processing, speech recognition, and recommendation systems, and are increasingly playing crucial roles in applications like human-computer interaction and intelligent customer service. Efficient inference solutions for large language models in data centers have been extensively researched, with a focus on meeting users’ Quality of Service requirements. In this paper, we focus on two additional requirements that responsible large language model inference should meet under QoS conditions: security throughout the model execution process and low maintenance requirements for the inference system. Therefore, we propose LLMaaS , a trusted model inference platform based on a serverless computing platform, aimed at providing inference as a service for large language models. First, we design a trusted serverless computing platform based on SGX, which includes distributed identity verification and SGX device plugins to ensure the security and trustworthiness of the inference process. Additionally, to reduce the maintenance requirements of the system, we enhance the SGX-based deep learning computing framework, including replacing PyTorch and using a greedy algorithm for graph partitioning. We conduct tests on four typical large models, and the experimental results demonstrate that, with minimal overhead and user code modifications, we can ensure the security of model execution.
What problem does this paper attempt to address?