Abstract:The emerging serverless computing has become a captivating paradigm for deploying cloud applications, alleviating developers' concerns about infrastructure resource management by configuring necessary parameters such as latency and memory constraints. Existing resource configuration solutions for cloud-based serverless applications can be broadly classified into modeling based on historical data or a combination of sparse measurements and interpolation/modeling. In pursuit of service response and conserving network bandwidth, platforms have progressively expanded from the traditional cloud to the edge. Compared to cloud platforms, serverless edge platforms often lead to more running overhead due to their limited resources, resulting in undesirable financial costs for developers when using the existing solutions. Meanwhile, it is extremely challenging to handle the heterogeneity of edge platforms, characterized by distinct pricing owing to their varying resource preferences. To tackle these challenges, we propose an adaptive and efficient approach called FireFace, consisting of prediction and decision modules. The prediction module extracts the internal features of all functions within the serverless application and uses this information to predict the execution time of the functions under specific configuration schemes. Based on the prediction module, the decision module analyzes the environment information and uses the Adaptive Particle Swarm Optimization algorithm and Genetic Algorithm Operator (APSO-GA) algorithm to select the most suitable configuration plan for each function, including CPU, memory, and edge platforms. In this way, it is possible to effectively minimize the financial overhead while fulfilling the Service Level Objectives (SLOs). Extensive experimental results show that our prediction model obtains optimal results under all three metrics, and the prediction error rate for real-world serverless applications is in the range of 4.25∼9.51%. Our approach can find the optimal resource configuration scheme for each application, which saves 7.2∼44.8% on average compared to other classic algorithms. Moreover, FireFace exhibits rapid adaptability, efficiently adjusting resource allocation schemes in response to dynamic environments.

AsyFunc: A High-Performance and Resource-Efficient Serverless Inference System via Asymmetric Functions

AsyFunc

Design and implementation of efficient distributed deep learning model inference architecture on serverless computation

AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs

Efficient Architecture Paradigm for Deep Learning Inference As a Service.

Automating Cloud Deployment for Real-Time Online Foundation Model Inference

Decentralized Proactive Model Offloading and Resource Allocation for Split and Federated Learning

FSD-Inference: Fully Serverless Distributed Inference with Scalable Cloud Communication

FireFace: Leveraging Internal Function Features for Configuration of Functions on Serverless Edge Platforms

An Adaptive DNN Inference Acceleration Framework with End–edge–cloud Collaborative Computing

Λ DNN : Achieving Predictable Distributed DNN Training with Serverless Architectures

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

A Fine-Grained End-to-End Latency Optimization Framework for Wireless Collaborative Inference

Cost-Efficient Serverless Inference Serving with Joint Batching and Multi-Processing.

AMPS-Inf: Automatic Model Partitioning for Serverless Inference with Cost Efficiency

DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative Inference

FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge

FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping

CoFB: latency-constrained co-scheduling of flows and batches for deep learning inference service on the CPU–GPU system

Functions as a service for distributed deep neural network inference over the cloud‐to‐things continuum

AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference under Stochastic Variance