Constructing time optimal schedules for pdimensional fixed-size systolic array

Hui Li,MuYi Chen,Peng Yi,Bingqiang Wang
DOI: https://doi.org/10.1109/ISCIT.2006.339967
2006-01-01
Abstract:An improved method for solving the well-known conflict-free scheduling problem for the locally sequential, globally parallel (LSGP) case of systolic array synthesis is proposed. A loop nest and a linear mapping to virtual processors is given, as is the clustering of rectangular arrangements of virtual processors into physical processors. A solution to the scheduling problem is a linear map of iteration indices to time that satisfies linear inequality constraints determined by data dependences. The schedule is conflict-free if no two iterations are scheduled simultaneously on the same processor. Say such a schedule juggles. It is tight if it juggles and, in the steady state, all processors are busy every cycle. Darte had provided a closed form solution that enables the enumeration of all tight schedules when the dimension of the array is (n -1) . Based on the (n - 1) dimensional tight scheduling algorithm (DTSA), this paper presents a 2-step construction algorithm (2SCA) for the scheduling of any p<(n - 1) dimensional processor array. The schedule set Omega generated by 2SCA is a subset of the tight schedule set Psi generated by DTSA. It is proved that the loop execution time corresponding to every schedule from Omega achieves the optimal bound. Because the order of Omega is just 1/Cp n-1 of Psi, the enumeration searching time for 2SCA is reduced to 1/Cp n-1 for the DTSA. Moreover, the execution time corresponding to all the schedules in Omega have the same most significant coefficient item, thus enumeration of Omega for a time-optimal schedule is eliminated, which further reduces the complexity. Finally, by the regulations property of 2SCA, the schedules in Omega lead to simpler and more efficient hardware-description code and hardware interface implementation
What problem does this paper attempt to address?