Abstract:Big data areas are expanding in a fast way in terms of increasing workloads and runtime systems, and this situation imposes a serious challenge to workload characterization, which is the foundation of innovative system and architecture design. The previous major efforts on big data benchmarking either propose a comprehensive but a large amount of workloads, or only select a few workloads according to so-called popularity, which may lead to partial or even biased observations. In this paper, on the basis of a comprehensive big data benchmark suite---BigDataBench, we reduced 77 workloads to 17 representative workloads from a micro-architectural perspective. On a typical state-of-practice platform---Intel Xeon E5645, we compare the representative big data workloads with SPECINT, SPECCFP, PARSEC, CloudSuite and HPCC. After a comprehensive workload characterization, we have the following observations. First, the big data workloads are data movement dominated computing with more branch operations, taking up to 92% percentage in terms of instruction mix, which places them in a different class from Desktop (SPEC CPU2006), CMP (PARSEC), HPC (HPCC) workloads. Second, corroborating the previous work, Hadoop and Spark based big data workloads have higher front-end stalls. Comparing with the traditional workloads i. e. PARSEC, the big data workloads have larger instructions footprint. But we also note that, in addition to varied instruction-level parallelism, there are significant disparities of front-end efficiencies among different big data workloads. Third, we found complex software stacks that fail to use state-of-practise processors efficiently are one of the main factors leading to high front-end stalls. For the same workloads, the L1I cache miss rates have one order of magnitude differences among diverse implementations with different software stacks.

Characterizing Data Analytics Workloads on Intel Xeon Phi

Understanding Data Analytics Workloads on Intel(R) Xeon Phi(R)

Test-driving Intel Xeon Phi

An Empirical Study of Intel Xeon Phi.

Towards Modeling Energy Consumption of Xeon Phi

Characterizing and Optimizing Java-based HPC Applications on Intel Many-Core Architecture.

Characterization and Architectural Implications of Big Data Workloads

An Early Performance Evaluation Of Opencl On Intel Xeon Phi

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications

Open JDK Meets Xeon Phi: A Comprehensive Study of Java HPC on Intel Many-Core Architecture.

Understanding Big Data Analytic Workloads on Modern Processors

Exploring Synchronization in Cache Coherent Manycore Systems: A Case Study with Xeon Phi

Investigating Large-Scale Feature Matching Using The Intel (R) Xeon Phi (Tm) Coprocessor

Accelerating Large-Scale Biological Database Search on Xeon Phi-based Neo-Heterogeneous Architectures

Experimentation Procedure for Offloaded Mini-Apps Executed on Cluster Architectures with Xeon Phi Accelerators

A Task Assignment Method for Phi Structure

Deep and Shallow convections in Atmosphere Models on Intel Xeon Phi Coprocessor Systems

Utilizing Multiple Xeon Phi Coprocessors on One Compute Node.

Characterizing data analysis workloads in data centers

Optimizing and Auto-Tuning Scale-Free Sparse Matrix-Vector Multiplication on Intel Xeon Phi