Abstract:Datacenter-scale clusters are evolving toward heterogeneous hardware architectures due to continuous server replacement. Meanwhile, datacenters are commonly shared by many users for quite different uses. It often exhibits significant performance heterogeneity due to multi-tenant interferences. The deployment of MapReduce on such heterogeneous clusters presents significant challenges in achieving good application performance compared to in-house dedicated clusters. As most MapReduce implementations are originally designed for homogeneous environments, heterogeneity can cause significant performance deterioration in job execution despite existing optimizations on task scheduling and load balancing. In this paper, we observe that the homogeneous configuration of tasks on heterogeneous nodes can be an important source of load imbalance and thus cause poor performance. Tasks should be customized with different configurations to match the capabilities of heterogeneous nodes. To this end, we propose a self-adaptive task tuning approach, Ant, that automatically searches the optimal configurations for individual tasks running on different nodes. In a heterogeneous cluster, Ant first divides nodes into a number of homogeneous subclusters based on their hardware configurations. It then treats each subcluster as a homogeneous cluster and independently applies the self-tuning algorithm to them. Ant finally configures tasks with randomly selected configurations and gradually improves tasks configurations by reproducing the configurations from best performing tasks and discarding poor performing configurations. To accelerate task tuning and avoid trapping in local optimum, Ant uses genetic algorithm during adaptive task configuration. Experimental results on a heterogeneous physical cluster with varying hardware capabilities show that Ant improves the average job completion time by 31, 20, and 14 percent compared to stock Hadoop (Stock), customized Hadoop with industry recommendations (Heuristic), and a profilingbased configuration approach (Starfish), respectively. Furthermore, we extend Ant to virtual MapReduce clusters in a multi-tenant private cloud. Specifically, Ant characterizes a virtual node based on two measured performance statistics: I/O rate and CPU steal time. It uses k-means clustering algorithm to classify virtual nodes into configuration groups based on the measured dynamic interference. Experimental results on virtual clusters with varying interferences show that Ant improves the average job completion time by 20, 15, and 11 percent compared to Stock, Heuristic and Starfish, respectively.

Improving MapReduce Performance by Data Prefetching in Heterogeneous or Shared Environments

Improving MapReduce Performance with Partial Speculative Execution

Scheduling Algorithm Based on Prefetching in MapReduce Clusters

HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters.

SHadoop: Improving MapReduce Performance by Optimizing Job Execution Mechanism in Hadoop Clusters

Matchmaking: A New MapReduce Scheduling Technique

The performance of MapReduce: an in-depth study

The Performance of MapReduce

Performance Optimization for Short MapReduce Job Execution in Hadoop

Improving MapReduce Performance in a Heterogeneous Cloud: A Measurement Study

ComMapReduce: an Improvement of MapReduce with Lightweight Communication Mechanisms

Improving MapReduce Performance via Heterogeneity-Load-Aware Partition Function

A Cache Sharing Mechanism Based on RD MA.

Correlation Based File Prefetching Approach for Hadoop

Optimization of RDMA-Based HDFS Data Distribution Mechanism.

Efficient Finer-Grained Incremental Processing with MapReduce for Big Data

MapReduce Performance Optimizing through Replica Placement Strategy

An Uncoupled Data Process and Transfer Model for MapReduce.

Research on Optimization Method of Merging and Prefetching for Massive Small Files in HDFS

Uncoupled MapReduce: A Balanced and Efficient Data Transfer Model

Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning