Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Chaofan Lin,Zhenhua Han,Chengruidong Zhang,Yuqing Yang,Fan Yang,Chen Chen,Lili Qiu
2024-05-30
Abstract:The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could design complex workflows using multiple LLM requests to accomplish one task. However, they have to use the over-simplified request-level API provided by today's public LLM services, losing essential application-level information. Public LLM services have to blindly optimize individual LLM requests, leading to sub-optimal end-to-end performance of LLM applications. This paper introduces Parrot, an LLM service system that focuses on the end-to-end experience of LLM-based applications. Parrot proposes Semantic Variable, a unified abstraction to expose application-level knowledge to public LLM services. A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests, providing a natural way to program LLM applications. Exposing Semantic Variables to the public LLM service allows it to perform conventional data flow analysis to uncover the correlation across multiple LLM requests. This correlation opens a brand-new optimization space for the end-to-end performance of LLM-based applications. Extensive evaluations demonstrate that Parrot can achieve up to an order-of-magnitude improvement for popular and practical use cases of LLM applications.
Machine Learning
What problem does this paper attempt to address?
The paper mainly focuses on the efficiency issues of large language models (LLMs) in serving LLM-based applications. Existing public LLM services only provide simple request-level APIs, which leads to the loss of application-level information and affects end-to-end performance. The paper proposes a LLM service system called Parrot, which introduces the concept of "semantic variables" to expose application-level knowledge to public LLM services. Semantic variables allow annotating input/output variables in LLM request prompts and creating data pipelines when connecting multiple LLM requests, making programming LLM applications more natural. By exposing semantic variables to public LLM services, the service can perform regular data flow analysis, discover the correlation between multiple LLM requests, and create new optimization opportunities to improve end-to-end performance of LLM-based applications. The paper points out that consecutive LLM requests may have dependencies, different scheduling preferences, and a large amount of redundant computation. The Parrot system addresses these issues with semantic variables, which can reduce network latency, achieve more efficient scheduling, and improve efficiency by eliminating redundant computations. Experiments show that Parrot can achieve performance improvements of up to an order of magnitude in popular and practical LLM application use cases.