An Arbitrary N-dimensional Stencil Transformer in Cilk + + ∗

Yuan Tang,Steven Bartel,Dina Kachintseva
2010-01-01
Abstract:Stencil computation is derived directly from solving partial differential equations (PDEs). The conventional way of solving the stencil problem is using nested loops, which sweep over a spatial grid, updating each point at time t + 1 by neighboring grid points at time t, t−1, ..., t−k. These kinds of loop algorithms, along with all their variants, inevitably suffer from consistent cache misses due to their memory access pattern. Matteo Frigo’s paper [3, 5, 4] invented a cache oblivious algorithm which greatly reduces the cache miss ratio. In this paper, we explored all known optimizations for stencil computations, along with some of our own innovative approaches, and concluded with concrete performance graphs which approaches are effective and which ones are not.
What problem does this paper attempt to address?