Iteration Interleaving--Based SIMD Lane Partition

Yaohua Wang,Dong Wang,Shuming Chen,Zonglin Liu,Shenggang Chen,Xiaowen Chen,Xu Zhou
DOI: https://doi.org/10.1145/2847253
IF: 1.444
2016-01-01
ACM Transactions on Architecture and Code Optimization
Abstract:The efficacy of single instruction, multiple data (SIMD) architectures is limited when handling divergent control flows. This circumstance results in SIMD fragments using only a subset of the available lanes. We propose an iteration interleaving--based SIMD lane partition (IISLP) architecture that interleaves the execution of consecutive iterations and dynamically partitions SIMD lanes into branch paths with comparable execution time. The benefits are twofold: SIMD fragments under divergent branches can execute in parallel, and the pathology of fragment starvation can also be well eliminated. Our experiments show that IISLP doubles the performance of a baseline mechanism and provides a speedup of 28% versus instruction shuffle.
What problem does this paper attempt to address?