Nest-loop Transformation Techniques Considering Timing and Memory Optimization for Embedded Systems

Edwin H.-M. Sha,Meilin Liu
2006-01-01
Abstract:Embedded systems are usually application-specific and more constrained in terms of timing, power, area, memory and other resources. Embedded DSP applications usually exhibit intensive computations in the form of loops. The performance and the code quality of the embedded DSP applications are mostly dependent on the performance and code quality of loops. Loop transformation techniques such as fusion, distribution, unrolling, tiling, permutation, and their combination offer a good opportunity to improve the performance of the loops. Therefore, integrating the loop transformation techniques in the framework of the compiler techniques becomes very important for the compiler to generate high-quality code for embedded systems. This dissertation has focused on developing models, methodologies, and algorithms for various loop transformation techniques including loop fusion, loop distribution, loop permutation, and their combination to optimize the execution of the loops for embedded systems. Our proposed loop transformation techniques are based on the data dependence analysis, the fundamental understanding of the basic properties of loops, retiming, and code size. After we understand the basic properties of the nested loops, we establish the theoretical foundations and develop novel algorithms for the various loop transformation techniques. The experimental results showed that our loop transformation techniques considering timing, code size, and data locality improved the performance of the compiled code of DSP applications significantly. In particular, we proposed a general legalizing loop fusion technique to maximize the opportunity of loop fusion when there exist fusion-preventing dependences. We proved that all the loops can be fused with proper transformation. We also proposed the graph transformation process to eliminate the fusion-preventing dependences based on the retiming concepts. We then proposed the algorithm to automatically generate the code of the fused loops and the formula to compute the resultant code size of the fused loops. By analyzing the relationship between the retiming values and the resultant code size of the fused loops, we proposed an improved loop fusion technique, the Select Loop Fusion Technique to select the best dimension to legalize loop fusion so that the resultant code size of the fused loops is minimal. Finally, we proposed a novel technique to combine various loop transformation techniques including loop fusion, loop distribution and loop permutation to improve the timing performance of the fused loops without jeopardizing the code size for embedded systems. Our proposed loop transformation techniques have been the important complements to the state-of-the-art loop transformation techniques.
What problem does this paper attempt to address?