Control-Flow Decoupling

Rami Sheikh,James Tuck,E. Rotenberg
DOI: https://doi.org/10.1109/MICRO.2012.38
2012-12-01
Micro
Abstract:Mobile and PC/server class processor companies continue to roll out flagship core micro architectures that are faster than their predecessors. Meanwhile placing more cores on a chip coupled with constant supply voltage puts per-core energy consumption at a premium. Hence, the challenge is to find future micro architecture optimizations that not only increase performance but also conserve energy. Eliminating branch mispredictions -- which waste both time and energy -- is valuable in this respect. We first explore the control-flow landscape by characterizing mispredictions in four benchmark suites. We find that a third of mispredictions-per-1K-instructions (MPKI) come from what we call separable branches: branches with large control-dependent regions (not suitable for if-conversion), whose backward slices do not depend on their control-dependent instructions or have only a short dependence. We propose control-flow decoupling (CFD) to eradicate mispredictions of separable branches. The idea is to separate the loop containing the branch into two loops: the first contains only the branch's predicate computation and the second contains the branch and its control-dependent instructions. The first loop communicates branch outcomes to the second loop through an architectural queue. Micro architecturally, the queue resides in the fetch unit to drive timely, non-speculative fetching or skipping of successive dynamic instances of the control-dependent region. Either the programmer or compiler can transform a loop for CFD, and we evaluate both. On a micro architecture configured similar to Intel's Sandy Bridge core, CFD increases performance by up to 43%, and reduces energy consumption by up to 41%. Moreover, for some applications, CFD is a necessary catalyst for future complexity-effective large-window architectures to tolerate memory latency.
What problem does this paper attempt to address?