Optimizing the SIMD Parallelism Through Bitwidth Analysis
ZHANG Wei-Hua,ZHU Jia-Hua,ZHANG Hong-Jiang,ZANG Bin-Yu
DOI: https://doi.org/10.3724/SP.J.1016.2009.02168
2009-01-01
Chinese Journal of Computers
Abstract:Although the SIMD units have been widely used in different architecture designs,the automatic optimizations for such architectures are not well developed yet. Since most optimizations for SIMD architectures are transplanted from traditional vectorization techniques,many special features of SIMD architectures,such as packed operations,have not been thoroughly considered. While operands are tightly packed within a register,there is no spare space to indicate overflow. To maintain the accuracy of automatic SIMDized programs,the operands should be unpacked to preserve enough space for interim overflow. However,such a strategy would lead to great overhead. Moreover,the additional instructions for handling overflows can sometimes prevent other optimizations. In this paper,a new technique,BCSA (Bitwidth controlled SIMD arithmetic),is proposed to reduce the negative effects caused by interim overflow handling and eliminate the interference of interim overflows. The algorithm is applied to the multimedia benchmarks of Berkeley. The experimental results show that the algorithm can significantly improve the performance of multimedia applications.