Cache-Friendly Design for Complex Spatially-Variable Coefficient Stencils on Many-Core Architectures

Jiarui Fang,Haohuan Fu,Guangwen Yang
DOI: https://doi.org/10.1109/hipc.2016.034
2016-01-01
Abstract:Many-core architectures, such as the NVIDIA graphics processing unit and Intel Xeon Phi, which are characterized by high computation resources but limited on-chip memory capacity, have been used to significantly accelerate various computationally demanding tasks. Stencil operators are naturally suitable for such architectures because of their parallel calculation patterns. However, only simple stencils with points distributed along the axes and with constant coefficients have been fully investigated. This study first provides insights into optimization strategies for stencils with complex shapes, including off-axial points and spatially variable coefficients. Through our proposed stencil-decomposition schemes, we maintain read-only coefficients in on-chip caches to avoid unvectorized memory access. To alleviate the resulting severe cache-starvation situation, a generalized cache-friendly design for many-core architecture is proposed. It can reduce cache miss times and cache space consumption. The proposed methodology significantly improves the performance of stencil operations in a real seismic imaging application and introduces a new option to write highly efficient memory-bound stencil-like loops.
What problem does this paper attempt to address?