Abstract:Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the offer of adhoc scaling of user deployments at function level introduces many complications to serverless systems. The added delay and failures in function request executions caused by the time consumed for dynamically creating new resources to suit function workloads, known as the cold-start delay, is one such very prevalent shortcoming. Maintaining idle resource pools to alleviate this issue often results in wasted resources from the cloud provider perspective. Existing solutions to address this limitation mostly focus on predicting and understanding function load levels in order to proactively create required resources. Although these solutions improve function performance, the lack of understanding on the overall system characteristics in making these scaling decisions often leads to the sub-optimal usage of system resources. Further, the multi-tenant nature of serverless systems requires a scalable solution adaptable for multiple co-existing applications, a limitation seen in most current solutions. In this paper, we introduce a novel multi-agent Deep Reinforcement Learning based intelligent solution for both horizontal and vertical scaling of function resources, based on a comprehensive understanding on both function and system requirements. Our solution elevates function performance reducing cold starts, while also offering the flexibility for optimizing resource maintenance cost to the service providers. Experiments conducted considering varying workload scenarios show improvements of up to 23% and 34% in terms of application latency and request failures, while also saving up to 45% in infrastructure cost for the service providers.

You Do Not Need a Bigger Boat: Recommendations at Reasonable Scale in a (Mostly) Serverless and Open Stack

A Cluster-Based Incremental Recommendation Algorithm on Stream Processing Architecture

Building a serverless Data Lakehouse from spare parts

Scaling Enterprise Recommender Systems for Decentralization

Reasonable Scale Machine Learning with Open-Source Metaflow

DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation

Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving

A reference architecture for serverless big data processing

A Distributed Real-Time Recommender System for Big Data Streams

The server is dead, long live the server: Rise of Serverless Computing, Overview of Current State and Future Trends in Research and Industry

Deep Reinforcement Learning (DRL)-based Methods for Serverless Stream Processing Engines: A Vision, Architectural Elements, and Future Directions

Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud

ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models

Scalable Relevant Project Recommendation on GitHub

Scaling New Frontiers: Insights into Large Recommendation Models

Data Efficiency for Large Recommendation Models

Recommendations in a Multi-Domain Setting: Adapting for Customization, Scalability and Real-Time Performance

BigSR: an empirical study of real-time expressive RDF stream reasoning on modern Big Data platforms

Optimizing simultaneous autoscaling for serverless cloud computing

A Deep Reinforcement Learning based Algorithm for Time and Cost Optimized Scaling of Serverless Applications

Dirigo: Self-scaling Stateful Actors For Serverless Real-time Data Processing