Trace-Based Dynamic Binary Parallelization

Jing Yang
DOI: https://doi.org/10.18130/v3479c
2012-01-01
Abstract:With the number of cores increasing rapidly but the performance per core increasing slowly at best, software must be parallelized in order to improve performance. Manual parallelization is often prohibitively time-consuming and error-prone (especially due to data races and memory-consistency complexities), and some portions of code may simply be too difficult to understand or refactor for parallelization. Most existing automatic parallelization techniques are performed statically at compile time and require source code to be analyzed, leaving a large fraction of software behind. In many cases, some or all of the source code and development tool chain is lost or, in the case of third-party software, was never available. Furthermore, modern applications are assembled and defined at run time, making use of shared libraries, virtual functions, plugins, dynamically-generated code, and other dynamic mechanisms, as well as multiple languages. All these aspects of separate compilation prevent the compiler from obtaining a holistic view of the program, leading to the risk of incompatible parallelization techniques, subtle data races, and resource over-subscription. All the above considerations motivate dynamic binary parallelization (DBP). This dissertation explores the novel idea of trace-based DBP, which provides a large instruction window without introducing spurious dependencies. We hypothesize that traces provide a generally good trade-off between code visibility and analysis accuracy for a wide variety of applications so as to achieve better parallel performance. Compared to the raw dynamic instruction stream (DIS), traces expose more distant parallelism opportunities because their average length is typically much larger than the size of the hardware instruction window. Compared to the complete control flow graph (CFG), traces only contain control and data dependencies on the execution path which is actually taken. More importantly, while DIS-based DBP typically only exploits fine-grained parallelism and CFG-based DBP typically only exploits coarse-grained parallelism, traces can be used as a unified representation of program execution to seamlessly incorporate the exploitation of both coarseand fine-grained parallelism. We develop Tracy, an innovative DBP framework which monitors a program at run time and
Computer Science
What problem does this paper attempt to address?