Abstract:Graph Neural Networks (GNNs) have been increasingly adopted for graph analysis in web applications such as social networks. Yet, efficient GNN serving remains a critical challenge due to high workload fluctuations and intricate GNN operations. Serverless computing, thanks to its flexibility and agility, offers on-demand serving of GNN inference requests. Alas, the request-centric serverless model is still too coarse-grained to avoid resource waste. Observing the significant data locality in computation graphs of requests, we propose λGrapher, a serverless system for GNN serving that achieves resource efficiency through graph sharing and fine-grained resource allocation. "Grapher features the following designs: (1) adaptive timeout for request buffering to balance resource efficiency and inference latency, (2) graph-centric scheduling to minimize computation and memory redundancy, and (3) resource-centric function management with fine-grained resource allocation catered to the resource sensitivities of GNN operations and function orchestration optimized to hide communication latency. We implement a prototype of λGrapher based on the representative open-source serverless platform Knative and evaluate it with real-world traces from various web applications. Our results show that λGrapher can achieve an average savings of 61.5% in memory resource and 47.2% in computing resource compared with the state of the arts while ensuring GNN inference latency.

Λgrapher: A Resource-Efficient Serverless System for GNN Serving Through Graph Sharing