Programming FFT on DSM Multiprocessors

Hongzhang Shan,Jianhua Feng,Hongzhong Shan
DOI: https://doi.org/10.1109/hpc.2000.843504
2000-01-01
Abstract:The performance of the shared address space programming model for the kinds of coarse-grained communicating programs which have traditionally been common in scientific computing, is not clear today. We use the challenging 1-dimensional FFT, a regular coarse-grained program, as our driving application to study how to get high performance for such kind of applications under the shared address space programming model on a hardware supported cache-coherent distributed memory machine. We find that its performance is highly affected by the data placement. Proper data placement will be critical to the success of this kind of application. Prefetching could further improve the performance to a degree of 10 percent to 50 percent for the data sets we studied. Naive programming will easily cause the performance bottleneck by introducing much more contention and lead to great performance loss. If the shared address space programs are properly programmed, it will deliver much better performance than the other popular programming models, such as MPI and SHMEM.
What problem does this paper attempt to address?