Abstract:With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be easily parallelized due to the presence of dependences. To solve this problem, different solutions have been proposed. Some of them make the optimistic assumption that such dependences rarely manifest themselves at runtime. However, when this assumption is violated, the recovery causes very large overhead. Other approaches incur large synchronization or computation overhead when resolving the dependences. Consequently, for a loop with frequently arising cross-iteration dependences, previous techniques are not able to speed up the execution. In this paper we propose a compiler technique which uses state separation and multiple value prediction to speculatively parallelize loops in sequential programs that contain frequently arising cross-iteration dependences. The key idea is to generate multiple versions of a loop iteration based on multiple predictions of values of variables involved in cross-iteration dependences (i.e., live-in variables). These speculative versions and the preceding loop iteration are executed in separate memory states simultaneously. After the execution, if one of these versions is correct (i.e., its predicted values are found to be correct), then we merge its state and the state of the preceding iteration because the dependence between the two iterations is correctly resolved. The memory states of other incorrect versions are completely discarded. Based on this idea, we further propose a runtime adaptive scheme that not only gives a good performance but also achieves better CPU utilization. We conducted experiments on 10 benchmark programs on a real machine. The results show that our technique can achieve 1.7x speedup on average across all used benchmarks.

Characterizing Fine-Grain Parallelism on Modern Multicore Platform

SoC performance modeling methodology and implementation basedon transaction dataflow

Hybrid Performance Modeling And Analyzing Of Parallel Systems

Exploring Fine-grained Task Parallelism on Simultaneous Multithreading Cores

Multi-Threading Performance on Commodity Multi-core Processors

Position-aware Thread-Level Speculative Parallelization for Large-Scale Chip-Multiprocessor.

Parallelization of Bayesian Network Based SNPs Pattern Analysis and Performance Characterization on SMP/HT

The performance model of Hyper-Threading Technology in Intel Nehalem microarchitecture

Potential Thread-Level-parallelism Exploration with Superblock Reordering

Scaling OLTP Applications on Commodity Multi-Core Platforms

Automatic parallelization of fine-grained metafunctions on a chip multiprocessor

Reverse Compilation for Speculative Parallel Threading

Speculative Parallelization Using State Separation and Multiple Value Prediction.

Speculative Parallelization of Sequential Loops on Multicores

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Exploring Potential Parallelism of Sequential Programs with Superblock Reordering

Tscale: A Contention-Aware Multithreaded Framework for Multicore Multiprocessor Systems

ON PARALLEL PROGRAMMING AND OPTIMISATION FOR MULTI-CORE

NestedMP: Taming Complex Configuration Space of Degree of Parallelism for Nested-Parallel Programs

Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures