A scalable barotropic mode solver for the parallel ocean program

Yong Hu,Xiaomeng Huang,Xiaoge Wang,Haohuan Fu,Shizhen Xu,Huabin Ruan,Wei Xue,Guangwen Yang
DOI: https://doi.org/10.1007/978-3-642-40047-6_74
2013-01-01
Abstract:This paper represents a novel strategy to improve the scalability of the barotropic mode in the Parallel Ocean Program (POP), by theoretically analyzing the barotropic communications bottleneck. POP discretizes the elliptic equations of the barotropic mode into a linear system Ax=b and solves it using the Preconditioned Conjugate Gradient (PCG) method. PCG scales poorly on distributed systems because of the time-consuming global reductions needed by the inner products in each iteration. A performance model is developed to quantify the scaling bottleneck of PCG. Based on this model, the classical Stiefel iteration (CSI), which was originally supposed to be less efficient than PCG, is identified as being promising for massive parallelism. In contrast to PCG, the recurrence parameters of CSI are determined by the spectrum of the coefficient matrix A instead of the inner product of the residuals in previous iterations. The Lanczos method is used to resolve the difficulty of estimating the eigenvalues of the large-scale matrix A. It constructs a small-scale tridiagonal matrix that has eigenvalues close to A. By replacing PCG with CSI, global reductions and their inherent poor scalability are eliminated in the barotropic mode. The implementation of CSI in POP with a 0.1 degree resolution can accerlate one barotropic step by five times, from 1.23s to 0.26s, on 15,000 cores.
What problem does this paper attempt to address?