Abstract:Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the offer of adhoc scaling of user deployments at function level introduces many complications to serverless systems. The added delay and failures in function request executions caused by the time consumed for dynamically creating new resources to suit function workloads, known as the cold-start delay, is one such very prevalent shortcoming. Maintaining idle resource pools to alleviate this issue often results in wasted resources from the cloud provider perspective. Existing solutions to address this limitation mostly focus on predicting and understanding function load levels in order to proactively create required resources. Although these solutions improve function performance, the lack of understanding on the overall system characteristics in making these scaling decisions often leads to the sub-optimal usage of system resources. Further, the multi-tenant nature of serverless systems requires a scalable solution adaptable for multiple co-existing applications, a limitation seen in most current solutions. In this paper, we introduce a novel multi-agent Deep Reinforcement Learning based intelligent solution for both horizontal and vertical scaling of function resources, based on a comprehensive understanding on both function and system requirements. Our solution elevates function performance reducing cold starts, while also offering the flexibility for optimizing resource maintenance cost to the service providers. Experiments conducted considering varying workload scenarios show improvements of up to 23% and 34% in terms of application latency and request failures, while also saving up to 45% in infrastructure cost for the service providers.

Design and implementation of efficient distributed deep learning model inference architecture on serverless computation

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

FSD-Inference: Fully Serverless Distributed Inference with Scalable Cloud Communication

ServerlessLLM: Low-Latency Serverless Inference for Large Language Models

Efficient Architecture Paradigm for Deep Learning Inference As a Service.

A Survey of Serverless Machine Learning Model Inference

Enabling Efficient Serverless Inference Serving for LLM (Large Language Model) in the Cloud

FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping

Functions as a service for distributed deep neural network inference over the cloud‐to‐things continuum

Serverless Distributed Learning for Smart Grid Analytics

Exploring the Impact of Serverless Computing on Peer To Peer Training Machine Learning

Distributed Double Machine Learning with a Serverless Architecture

A Deep Reinforcement Learning based Algorithm for Time and Cost Optimized Scaling of Serverless Applications

Distributed Assignment With Load Balancing for DNN Inference at the Edge

Online Learning for Orchestration of Inference in Multi-User End-Edge-Cloud Networks

Serverless inferencing on Kubernetes

Architecting Peer-to-Peer Serverless Distributed Machine Learning Training for Improved Fault Tolerance

MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms

Incorporating Serverless Computing into P2P Networks for ML Training: In-Database Tasks and Their Scalability Implications (Student Abstract)

Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading

Software Engineering for Serverless Computing