Performance Optimization for Parallel Systems with Shared DWM Via Retiming, Loop Scheduling, and Data Placement
Siyuan Gao,Shouzhen Gu,Rui Xu,Edwin Hsing-Mean Sha,Qingfeng Zhuge
DOI: https://doi.org/10.1016/j.sysarc.2020.101842
IF: 5.836
2020-01-01
Journal of Systems Architecture
Abstract:Domain Wall Memory (DWM) as an ideal candidate for replacing traditional memories especially in parallel systems, has many desirable characteristics such as low leakage power, high density and low access latency. However, due to the tape-like architecture of DWM, shift operations have a vital impact on performance. Considering data-intensive applications with massive loops and arrays, increasing parallelism of loops, appropriate loop scheduling and data placement on DWM will significantly improve the performance of parallel systems. This paper explores optimizing performance of parallel systems through retiming, loop scheduling and data placement especially when the data are arrays. It proposes Integer Linear Programming (ILP) formulation and Scheduling While Placing (SWP) algorithm to generate optimal or nearly optimal loop scheduling and data placement with minimum execution time. The experimental results show that SWP and ILP can effectively reduce execution time when compared with greedy List Scheduling First Access First Place (LF) algorithm. Besides, this paper proposes Threshold Retiming Repetition (TRR) algorithm to combine the retiming technique with SWP and ILP. The experimental results show that SWP+TRR and ILP+TRR can further reduce the execution time when compared to results without retiming.