Abstract:With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be easily parallelized due to the presence of dependences. To solve this problem, different solutions have been proposed. Some of them make the optimistic assumption that such dependences rarely manifest themselves at runtime. However, when this assumption is violated, the recovery causes very large overhead. Other approaches incur large synchronization or computation overhead when resolving the dependences. Consequently, for a loop with frequently arising cross-iteration dependences, previous techniques are not able to speed up the execution. In this paper we propose a compiler technique which uses state separation and multiple value prediction to speculatively parallelize loops in sequential programs that contain frequently arising cross-iteration dependences. The key idea is to generate multiple versions of a loop iteration based on multiple predictions of values of variables involved in cross-iteration dependences (i.e., live-in variables). These speculative versions and the preceding loop iteration are executed in separate memory states simultaneously. After the execution, if one of these versions is correct (i.e., its predicted values are found to be correct), then we merge its state and the state of the preceding iteration because the dependence between the two iterations is correctly resolved. The memory states of other incorrect versions are completely discarded. Based on this idea, we further propose a runtime adaptive scheme that not only gives a good performance but also achieves better CPU utilization. We conducted experiments on 10 benchmark programs on a real machine. The results show that our technique can achieve 1.7x speedup on average across all used benchmarks.

Detecting Performance Variance for Parallel Applications Without Source Code

VAPRO: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications

vSensor: leveraging fixed-workload snippets of programs for performance variance detection.

Vs Ensor

Lightweight Noise Detection

Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems

Varcatcher: A Framework for Tackling Performance Variability of Parallel Workloads on Multi-Core

Hybrid Performance Modeling And Analyzing Of Parallel Systems

Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel Applications

HPC System Software Enhanced by Source Code Analysis

Communication Analysis And Performance Prediction Of Parallel Applications On Large-Scale Machines

Identifying Scalability Bottlenecks for Large-Scale Parallel Programs with Graph Analysis

Vapor: Virtual Machine Based Parallel Program Profiling Framework

Performance Prediction for Large-Scale Parallel Applications Using Representative Replay

Offline Data Dependence Analysis to Facilitate Runtime Parallelism Extraction

ScalAna: Automating Scaling Loss Detection with Graph Analysis

Graph-Centric Performance Analysis for Large-Scale Parallel Applications

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Speculative Parallelization Using State Separation and Multiple Value Prediction.

Visual Monitoring Environment for Parallel Processors

A Multivariate Characterization and Detection of Software Performance Antipatterns