Optimal Polynomial Smoothers for Parallel AMG

Pasqua D'Ambra,Fabio Durastante,Salvatore Filippone,Stefano Massei,Stephen Thomas
2024-07-13
Abstract:In this paper, we propose some Chebyshev polynomials of the 1st-kind which produce optimal bound for a polynomial dependent constant involved in the AMG $V$-cycle error bound and do not require information about the spectrum of matrices. We formulate a variant of a minimax problem already proposed in [J. Lottes, Optimal polynomial smoothers for multigrid V-cycles, Numer. Lin. Alg. with Appl., 30 (2023), p. e2518, <a class="link-external link-https" href="https://doi.org/10.1002/nla.2518" rel="external noopener nofollow">this https URL</a>] and define Chebyshev polynomial of the 1st-kind as acceleration for a weighted-Jacobi smoother; we also describe efficient GPU kernels for the application of the polynomial smoother and compare results with accelerators defined in [J. Lottes, Optimal polynomial smoothers for multigrid V-cycles, Numer. Lin. Alg. with Appl., 30 (2023), p. e2518, <a class="link-external link-https" href="https://doi.org/10.1002/nla.2518" rel="external noopener nofollow">this https URL</a>] on usual benchmarks at very large scales.
Numerical Analysis
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the algebraic multigrid (AMG) method, find polynomial smoothers that can optimize the V - cycle error bounds and these smoothers do not depend on the spectral information of the matrix. Specifically, the author proposes a class of Chebyshev polynomials of the first kind as polynomial smoothers to achieve the optimal bounds for the AMG V - cycle error bounds. ### Detailed Explanation #### Research Background The algebraic multigrid (AMG) method is a class of iterative algorithms for solving large - scale linear systems, especially suitable for systems from scalar elliptic isotropic partial differential equations (PDEs). The AMG method has algorithm scalability and optimal linear - time complexity, making it very suitable for exascale computing challenges. Traditionally, the Gauss - Seidel method is the preferred smoother in AMG, but its parallel performance is poor, especially on modern multi - core processors. Therefore, researchers have been looking for smoothers suitable for parallel computing environments. Polynomial smoothers are an alternative, which utilize sparse matrix - vector multiplication, which is well - optimized on contemporary hardware. #### Research Problems This paper aims to solve the following problems: 1. **Optimize V - cycle error bounds**: Optimize the error bounds of the AMG V - cycle by choosing appropriate polynomial smoothers. 2. **Independent of matrix spectral information**: Design smoothers and optimization procedures that do not depend on the matrix spectral information. #### Solutions The author proposes using Chebyshev polynomials of the first kind as smoothers and solves the following minimax problem: \[ \gamma_k := \min_{p(x) \in \Pi_k} \max_{x \in (0,1]} \left| \frac{x p(x)^2}{1 - p(x)^2} \right| \] where $\Pi_k$ is the set of polynomials satisfying $p(0) = 1$ and $|p(x)| \leq 1$ for all $x\in[0,1]$. The specific steps are as follows: 1. **Define Chebyshev polynomials**: Apply Chebyshev polynomials of the first kind to the interval $[a, 1]$ and optimize the left - end point $a$ through a nonlinear equation. 2. **GPU implementation**: Develop efficient NVIDIA GPU kernels for applying these polynomial accelerators in the PSCToolkit framework. #### Experimental Results Experiments show that in the case of low - order polynomials ($k\leq5$), the error bounds of Chebyshev polynomials of the first kind are lower than those of Chebyshev polynomials of the fourth kind and are comparable to the optimal bounds of numerical approximation. ### Summary The main contribution of this paper is to propose a polynomial smoother based on Chebyshev polynomials of the first kind, which can achieve better error bounds in the AMG V - cycle and does not depend on the spectral information of the matrix. In addition, the author has also developed an efficient GPU implementation, showing superior performance in large - scale benchmark tests.