Abstract:Storage clouds, such as Amazon S3, are being widely used for web services and Internet applications. It has been observed that the delay for retrieving data from and placing data into the clouds is quite random, and exhibits weak correlations between different read/write requests. This inspires us to investigate a key problem: can we reduce the delay by transmitting data replications in parallel or using powerful erasure codes? In this paper, we study the problem of reducing the delay of downloading data from cloud storage systems by leveraging multiple parallel threads, assuming that the data has been encoded and stored in the clouds using fixed rate forward error correction (FEC) codes with parameters (n, k). That is., each file is divided into k equal-sized chunks, which are then expanded into n chunks such that any k chunks out of the n are sufficient to successfully restore the original file. The model can be depicted as a multiple-server queue with arrivals of data retrieving requests and a server corresponding to a thread. However, this is not a typical queueing model because a server can terminate its operation, depending on when other servers complete their service (due to the redundancy that is spread across the threads). Hence, to the best of our knowledge, the analysis of this queueing model remains quite uncharted. Real traces from Amazon S3 show that the time to retrieve a fixed size chunk is random and can be accurately approximated as an i.i.d. exponentially distributed random variable. We show that any work-conserving scheme is delay-optimal when k = 1. When k > 1, we find that a simple greedy scheme, which allocates all available threads to the head of line request, is delay optimal, which appears surprising.

Scheduling for Time-Constrained Big-File Transfer Over Multiple Paths in Cloud Computing.

A Novel Job Scheduling Model to Enhance Efficiency and Overall User Fairness of Cloud Computing Environment.

On-demand File Transfer Algorithm Supporting Data-parallel over Multi-cluster

A Dynamical and Load-Balanced Flow Scheduling Approach for Big Data Centers in Clouds.

Scheduling Multiple Workflows with Time Constraints onto Cloud Computing Resources

Dynamic Scheduling Algorithms for Large File Transfer on Multi-user Optical Grid Network Based on Efficiency and Fairness

A Scheduler System for Large-Scale Distributed Data Computing in Cloud

A Two-Step Data Placement and Task Scheduling Strategy for Optimizing Scientific Workflow Performance on Cloud Computing Platform

Elastic and Flexible Multi-Stage Task Scheduling with Deadline-Constraint in Clouds

Optimizing Distributed Networking with Big Data Scheduling and Cloud Computing

A New Block-Based Data Distribution Mechanism in Cloud Computing

Adaptive Multi-Path SnF Scheduling Method for Delay-Sensitive Transfers Across Inter-Datacenter Optical Networks

Cost-Efficient Scheduling of Bulk Transfers in Inter-Datacenter WANs

When Queueing Meets Coding: Optimal-latency Data Retrieving Scheme in Storage Clouds

Smart-blocking File Storage Method in Cloud Computing

A New Integer Programming Model for the File Transfer Scheduling Problem

A Clustering Based Coscheduling Strategy for Efficient Scientific Workflow Execution in Cloud Computing.

Work in Progress: Topology-based Multilevel Algorithm for Large-scale Task Scheduling in Clouds.

Block-Based Concurrent and Storage-Aware Data Streaming for Grid Applications with Lots of Small Files

A Novel Approach to Scheduling Workflows Upon Cloud Resources with Fluctuating Performance