Abstract:With the number of cores increasing rapidly but the performance per core increasing slowly at best, software must be parallelized in order to improve performance. Manual parallelization is often prohibitively time-consuming and error-prone (especially due to data races and memory-consistency complexities), and some portions of code may simply be too difficult to understand or refactor for parallelization. Most existing automatic parallelization techniques are performed statically at compile time and require source code to be analyzed, leaving a large fraction of software behind. In many cases, some or all of the source code and development tool chain is lost or, in the case of third-party software, was never available. Furthermore, modern applications are assembled and defined at run time, making use of shared libraries, virtual functions, plugins, dynamically-generated code, and other dynamic mechanisms, as well as multiple languages. All these aspects of separate compilation prevent the compiler from obtaining a holistic view of the program, leading to the risk of incompatible parallelization techniques, subtle data races, and resource over-subscription. All the above considerations motivate dynamic binary parallelization (DBP). This dissertation explores the novel idea of trace-based DBP, which provides a large instruction window without introducing spurious dependencies. We hypothesize that traces provide a generally good trade-off between code visibility and analysis accuracy for a wide variety of applications so as to achieve better parallel performance. Compared to the raw dynamic instruction stream (DIS), traces expose more distant parallelism opportunities because their average length is typically much larger than the size of the hardware instruction window. Compared to the complete control flow graph (CFG), traces only contain control and data dependencies on the execution path which is actually taken. More importantly, while DIS-based DBP typically only exploits fine-grained parallelism and CFG-based DBP typically only exploits coarse-grained parallelism, traces can be used as a unified representation of program execution to seamlessly incorporate the exploitation of both coarseand fine-grained parallelism. We develop Tracy, an innovative DBP framework which monitors a program at run time and

Trace-Based Dynamic Binary Parallelization

Automated Parallel Kernel Extraction from Dynamic Application Traces

A Dynamic-Static Combined Code Layout Reorganization Approach for Dynamic Binary Translation.

A Runtime Profile Method for Dynamic Binary Translation Using Hardware-Support Technique

A Translation Framework for Executing the Sequential Binary Code on CPU/GPU Based Architectures

Automatic Tracing in Task-Based Runtime Systems

Enabling tracing Of long-running multithreaded programs via dynamic execution reduction.

GXBIT: Combining polyhedral model with dynamic binary translation

The Implementation of Dynamic Linking in Dynamic Binary Translation Systems

CoDBT: A multi-source dynamic binary translator using hardware-software collaborative techniques

Offline Data Dependence Analysis to Facilitate Runtime Parallelism Extraction

Design and Implementation of a Tracer Driver: Easy and Efficient Dynamic Analyses of Constraint Logic Programs

Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications

Optimistic Shared Memory Dependence Tracing

MT-BTRIMER: A Master-Slave Multi-threaded Dynamic Binary Translator

The Optimizations in Dynamic Binary Translation

Fact: Fast Communication Trace Collection For Parallel Applications Through Program Slicing

Trace-based Debugging for Advanced-Dispatching Programming Languages

DBAF - Dynamic Binary Analysis Framework and Its Applications.

An Online Profile Guided Optimization Approach For Speculative Parallel Threading

Simultaneous multithreading trace processors: Improving trace processors performance