Abstract:As architecture, systems, and data management communities pay greater attention to innovative big data systems and architecture, the pressure of benchmarking and evaluating these systems rises. However, the complexity, diver-sity, frequently changed workloads, and rapid evolution of big data systems raise great challenges in big data bench-marking. Considering the broad use of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads, which is the prerequisite for evaluating big data systems and architecture. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purposes mentioned above. This paper presents our joint research efforts on this issue with several industrial partners. Our big data benchmark suite-BigDataBench not only covers broad application scenarios, but also includes diverse and representative data sets. Currently, we choose 19 big data benchmarks from dimensions of application scenarios, operations/algorithms, data types, data sources, software stacks, and application types, and they are comprehensive for fairly measuring and evaluating big data systems and architecture. Big-DataBench is publicly available from the project home page http://prof.ict.ac.cn/BigDataBench. Also, we comprehensively characterize 19 big data workloads included in BigDataBench with varying data in-puts. On a typical state-of-practice processor, Intel Xeon E5645, we have the following observations: First, in comparison with the traditional benchmarks: including PAR-SEC, HPCC, and SPECCPU, big data applications have very low operation intensity, which measures the ratio of the total number of instructions divided by the total byte number of memory accesses; Second, the volume of data input has non-negligible impact on micro-architecture character-istics, which may impose challenges for simulation-based big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache (L1 I) misses per 1000 instructions (in short, MPKI) of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench.

BOPS, Not FLOPS! A New Metric and Roofline Performance Model For Datacenter Computing

P F ] 1 3 A ug 2 01 9 HPC AI 500 : A Benchmark Suite for HPC AI Systems

AI-oriented Workload Allocation for Cloud-Edge Computing.

BPS: A Performance Metric of I/O System

A Numerical Model Oriented Large-scale Parallel I/O Optimization Method.

An Empirical Roofline Model for Extreme-Scale I/O Workload Analysis

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

Towards Optimizing Storage Costs on the Cloud

Cost-Performance Modeling with Automated Benchmarking on Elastic Computing Clouds

Characterization and Architectural Implications of Big Data Workloads

Loosely-Coupled Benchmark Framework Automates Performance Modeling on IaaS Clouds

Cloudrank-V: A Desktop Cloud Benchmark With Complex Workloads

Towards Energy-Proportional Computing Using Subsystem-Level Power Management

NO2: Speeding Up Parallel Processing of Massive Compute-Intensive Tasks

BigDataBench: A big data benchmark suite from internet services

Cloud Server Benchmarks for Performance Evaluation of New Hardware Architecture

BigOP: Generating Comprehensive Big Data Workloads as a Benchmarking Framework

An Energy-Efficient System on a Programmable Chip Platform for Cloud Applications.

Characterizing and Optimizing TPC-C Workloads on Large-Scale Systems Using SSD Arrays

Parallel Scientific Power Calculations in Cloud Data Center Based On Decomposition-Coordination Directed Acyclic Graph

AxPUE: Application Level Metrics for Power Usage Effectiveness in Data Centers