Abstract:With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be easily parallelized due to the presence of dependences. To solve this problem, different solutions have been proposed. Some of them make the optimistic assumption that such dependences rarely manifest themselves at runtime. However, when this assumption is violated, the recovery causes very large overhead. Other approaches incur large synchronization or computation overhead when resolving the dependences. Consequently, for a loop with frequently arising cross-iteration dependences, previous techniques are not able to speed up the execution. In this paper we propose a compiler technique which uses state separation and multiple value prediction to speculatively parallelize loops in sequential programs that contain frequently arising cross-iteration dependences. The key idea is to generate multiple versions of a loop iteration based on multiple predictions of values of variables involved in cross-iteration dependences (i.e., live-in variables). These speculative versions and the preceding loop iteration are executed in separate memory states simultaneously. After the execution, if one of these versions is correct (i.e., its predicted values are found to be correct), then we merge its state and the state of the preceding iteration because the dependence between the two iterations is correctly resolved. The memory states of other incorrect versions are completely discarded. Based on this idea, we further propose a runtime adaptive scheme that not only gives a good performance but also achieves better CPU utilization. We conducted experiments on 10 benchmark programs on a real machine. The results show that our technique can achieve 1.7x speedup on average across all used benchmarks.

Varcatcher: A Framework for Tackling Performance Variability of Parallel Workloads on Multi-Core

vSensor: leveraging fixed-workload snippets of programs for performance variance detection.

Hybrid Performance Modeling And Analyzing Of Parallel Systems

Detecting Performance Variance for Parallel Applications Without Source Code

Vs Ensor

Time-sharing Parallel Applications Through Performance-Targeted Feedback-Controlled Real-Time Scheduling.

Lightweight Noise Detection

A novel cross-layer framework for early-stage power delivery and architecture co-exploration.

Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems

VAPRO: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications

Characterizing Fine-Grain Parallelism on Modern Multicore Platform

Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference

Vapor: Virtual Machine Based Parallel Program Profiling Framework

NO2: Speeding Up Parallel Processing of Massive Compute-Intensive Tasks

ParaInsight: an Assistant for Quantitatively Analyzing Multi-granularity Parallel Region.

Speculative Parallelization Using State Separation and Multiple Value Prediction.

Scaling OLTP Applications on Commodity Multi-Core Platforms

Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations

Exploring Fine-grained Task Parallelism on Simultaneous Multithreading Cores

Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures