Learning-Based Characterizing and Modeling Performance Bottlenecks of Big Data Workloads

Zhongxin Guo,Zheng Hu,Chunhong Zhang,Youer Pu
DOI: https://doi.org/10.1109/hpcc-smartcity-dss.2016.0124
2016-01-01
Abstract:As the increasing demands of large-scale data analytics, the understanding of performance bottlenecks on big data workloads becomes critical for the optimization of distribution platforms. Existing work focused on qualitatively characterizing the behaviors and performance of workloads. However little effort has been spent on quantification of performance bottlenecks and building bottleneck models. In this paper, we define a series of bottleneck ratios to quantify bottlenecks according to resource utilizations. Then based on features parsed from original logs, a stage-level modeling approach is proposed to characterize bottlenecks of workloads. By modeling, we can estimate bottleneck ratios using original logs, without collecting resource utilizations. To generalize the models for diverse workloads, we propose a workload generator: TrainBench, which is flexible to generate workloads with multifarious behaviors at stage-level. In addition, taking hardware performance into account, three key features are extracted to improve the estimation accuracy. Our bottleneck models perform well for diverse workloads in different clusters.
What problem does this paper attempt to address?