Abstract:As the cloud platform becomes a promising alternative to traditional HPC (high performance computing) centers or in-house clusters, the I/O bottleneck problem is highlighted in this new environment, typically with top-of-the-line compute instances but sub-par communication and I/O facilities. It has been observed that changing the cloud I/O system configurations, such as choices of file systems, number of I/O servers and their placement strategies, etc., will lead to a considerable variation in the performance and cost efficiency of I/O intensive parallel applications. However, storage system configuration is tedious and error-prone to do manually, even for expert users, leading to solutions that are grossly over-provisioned (low cost inefficiency), substantially under-performing (poor performance) or, in the worst case, both. This paper proposes ACIC, a system which automatically searches for optimized I/O system configurations from many candidates for each individual application running on a given cloud platform. ACIC takes advantage of machine learning models to perform performance/cost predictions. To tackle the high-dimensional parameter exploration space, we enable affordable, reusable, and incremental training on cloud platforms, guided by the Plackett and Burman Matrices for experiment design. Our evaluation results with four representative parallel applications indicate that ACIC consistently identifies optimal or near-optimal configurations among a large group of candidate settings. The top ACIC-recommended configuration is capable of improving the applications' performance by a factor of up to 10.5 (3.1 on average), and cost saving of up to 89 percent (51 percent on average), compared with a commonly used baseline I/O configuration. In addition, we carried out a small-scale user study for one of the test applications, which found that ACIC consistently beat the user and even the application's developer, often by a significant margin, in selecting optimized configurations.

Rethinking the Cloudonomics of Efficient I/O for Data-Intensive Analytics Applications

Towards Optimizing Storage Costs on the Cloud

Cost-Intelligent Data Analytics in the Cloud

Understanding I/O Performance Behaviors of Cloud Storage from a Client's Perspective

Saving Money for Analytical Workloads in the Cloud

Rethinking Storage Management for Data Processing Pipelines in Cloud Data Centers

Data Caching for Enterprise-Grade Petabyte-Scale OLAP

Cost Optimization for Cloud Storage from User Perspectives: Recent Advances, Taxonomy, and Survey

Moving Big Data to The Cloud: An Online Cost-Minimizing Approach

Optimizing Cloud Infrastructure for Real-time AI Processing: Challenges and Solutions

Energy-Efficient Data Processing in Cloud Computing Centers

Moving big data to the cloud

A Highly Practical Approach Toward Achieving Minimum Data Sets Storage Cost in the Cloud

Caching or Not: an Online Cost Optimization Algorithm for Geodistributed Data Analysis in Cloud Environments

One Optimized I/O Configuration Per HPC Application

A Cost-effective Framework for Running Industrial Big Data Analysis Applications in Public Clouds

Cost-Effective Cloud Server Provisioning for Predictable Performance of Big Data Analytics

Automatic Cloud I/O Configurator for I/O Intensive Parallel Applications

To store or not: Online cost optimization for running big data jobs on the cloud

An Adaptive Approach to Better Load Balancing in a Consumer-Centric Cloud Environment