Abstract:Over the last decade, we have witnessed growing data volumes generated and stored across geographically distributed datacenters. Processing such geo-distributed datasets may suffer from significant slowdown as the underlying network flows have to go through the inter-datacenter networks with relatively low and highly heterogeneous available link bandwidth. Thus, optimizing the transmissions of inter-datacenter flows, especially coflows that capture application-level semantics, is important for improving the communication performance of such geo-distributed applications. However, prior solutions on coflow scheduling have significant limitations: they schedule coflows with already-fixed endpoints of flows, making them insufficient to optimize the coflow completion time (CCT). In this article, we focus on the problem of jointly considering endpoint placement and coflow scheduling to minimize the average CCT of coflows across geo-distributed datacenters. To solve this problem without any prior knowledge of coflow arrivals, we present a coflow-aware optimization framework called SmartCoflow. In SmartCoflow, we first apply an approximate algorithm to obtain the endpoint placement and scheduling decisions for a single coflow. Based on the single-coflow solution, we then develop an efficient online algorithm to handle the dynamically arrived coflows. Through rigorous theoretical analysis, we prove that SmartCoflow has a non-trivial competitive ratio. We also extend SmartCoflow to incorporate various design choices or requirements of applications and operators, such as enforcing an inter-datacenter bandwidth usage budget and considering coflow deadline. Through experimental results from testbed implementation and trace-driven simulations, we demonstrate that SmartCoflow can reduce the average CCT, lower bandwidth usage, and improve coflow deadline meet rate, when compared t- the state-of-the-art scheduling-only method.

CEFS: compute-efficient flow scheduling for iterative synchronous applications

US-Byte: an Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning

Fast Coflow Scheduling Via Traffic Compression and Stage Pipelining in Datacenter Networks

Efficient Scheduling for Multi-Stage Coflows

Multi-Source Coflow Scheduling in Collaborative Edge Computing with Multihop Network

CoFB: latency-constrained co-scheduling of flows and batches for deep learning inference service on the CPU–GPU system

A Deadline-Aware Coflow Scheduling Approach for Big Data Applications.

LAFS: Learning-Based Application-Agnostic Flow Scheduling for Datacenters

Efficient Coflow Scheduling in Hybrid-Switched Data Center Networks

DCoflow: Deadline-Aware Scheduling Algorithm for Coflows in Datacenter Networks

Distributed Bottleneck-Aware Coflow Scheduling in Data Centers

A Heterogeneous Cloud-Edge Collaborative Computing Architecture with Affinity-Based Workflow Scheduling and Resource Allocation for Internet-of-Things Applications

Cross-Layer Self-Similar Coflow Scheduling for Machine Learning Clusters.

Key Flow First Prioritized Flow Scheduling Strategy In Multi-Tenant Data Centers

Efficient Scheduling of Weighted Coflows in Data Centers

Endpoint-Flexible Coflow Scheduling Across Geo-Distributed Datacenters

Joint Online Coflow Routing and Scheduling in Data Center Networks

Efficient and Fair: Information-Agnostic Online Coflow Scheduling by Combining Limited Multiplexing with DRL

Cost and Energy Aware Scheduling Algorithm for Scientific Workflows with Deadline Constraint in Clouds.

A Scalable Deep Reinforcement Learning Model for Online Scheduling Coflows of Multi-Stage Jobs for High Performance Computing

To schedule or not to schedule: when no-scheduling can beat the best-known flow scheduling algorithm in datacenter networks