Abstract:Modern latency-critical online services often rely on composing results from a large number of server components. Hence the tail latency (e.g. the 99th percentile of response time), rather than the average, of these components determines the overall service performance. When hosted on a cloud environment, the components of a service typically co-locate with short batch jobs to increase machine utilizations, and share and contend resources such as caches and I/O bandwidths with them. The highly dynamic nature of batch jobs in terms of their workload types and input sizes causes continuously changing performance interference to individual components, hence leading to their latency variability and high tail latency. However, existing techniques either ignore such fine-grained component latency variability when managing service performance, or rely on executing redundant requests to reduce the tail latency, which adversely deteriorate the service performance when load gets heavier. In this paper, we propose PCS, a predictive and component-level scheduling framework to reduce tail latency for large-scale, parallel online services. It uses an analytical performance model to simultaneously predict the component latency and the overall service performance on different nodes. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing performance interferences from batch jobs. We demonstrate that, using realistic workloads, the proposed scheduler reduces the component tail latency by an average of 67.05\% and the average overall service latency by 64.16\% compared with the state-of-the-art techniques on reducing tail latency.

TailCutter: Wisely Cutting Tail Latency in Cloud CDNs under Cost Constraints

PCS: Predictive Component-level Scheduling for Reducing Tail Latency in Cloud Online Services

Online Traffic Allocation Based on Percentile Charging for Practical CDNs

SafeTail: Efficient Tail Latency Optimization in Edge Service Scheduling via Computational Redundancy Management

Collaborative Cloud and Edge Computing for Latency Minimization

A Practical Cross-Datacenter Fault-Tolerance Algorithm in the Cloud Storage System.

Optimizing Multi-Cloud CDN Deployment and Scheduling Strategies Using Big Data Analysis

Tail-Learning: Adaptive Learning Method for Mitigating Tail Latency in Autonomous Edge Systems

Online Midgress-Sensitive Traffic Allocation for Percentile Charging in Pracitcal CDNs.

Towards Power Consumption-Delay Tradeoff by Workload Allocation in Cloud-Fog Computing.

TailGuard: Tail Latency SLO Guaranteed Task Scheduling for Data-Intensive User-Facing Applications

Online Cost Minimization for Operating Geo-Distributed Cloud CDNs

STARFRONT: Cooperatively Constructing Pervasive and Low-Latency CDNs Upon Emerging LEO Satellites and Clouds

Adaptive Data Transmission in the Cloud.

Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery Network

Where is the Traffic Going? A Comparative Study of Clouds Following Different Designs

DCloud: Deadline-Aware Resource Allocation for Cloud Computing Jobs

Reducing Tail Latencies Through Environment- and Neighbour-aware Thread Management

Energy Management in Cross-Domain Content Delivery Networks: A Theoretical Perspective

Measuring and Evaluating TCP Splitting for Cloud Services

RPC: Joint Online Reducer Placement and Coflow Bandwidth Scheduling for Clusters