Mapping Loops of multimedia algorithms for Coarse-grained reconfigurable architectures

Ziyu Yang,Peng Zhao,Guanwu Wang,Sikun Li
DOI: https://doi.org/10.1109/ICSPCC.2012.6335722
2012-01-01
Abstract:Coarse-Grained Reconfigurable Architectures (CGRAs) are widely used as coprocessors to accelerate data-intensive applications. However, the parallelization of sequential programs and the optimization of critical loops are still challenging issues, since the access delay introduced by the massive memory accesses contained in those loops has become the bottleneck of CGRA's performance. In this paper we focus on the parallel optimization of applications by considering the critical loops mapping under the CGRA's resource constraints. We first propose a novel approach to parallelize loops by multi-level tilling. Then a genetic algorithm is introduced to schedule tiled loops with memory-aware object functions. Data locality and communication cost are optimized during the parallel processing as well. Experimental results show that our approach can generate more effective parallel tasks to improve the data locality and load-balanced execution, while obtains 9.6% better speedup compared with the memory-unaware parallel processing.
What problem does this paper attempt to address?