Low-Order Finite Element Solver with Small Matrix-Matrix Multiplication Accelerated by AI-Specific Hardware for Crustal Deformation Computation

Takuma Yamaguchi,Kohei Fujita,Tsuyoshi Ichimura,Akira Naruse,Jack C. Wells,Christopher J. Zimmer,Tjerk P. Straatsma,Muneo Hori,Lalith Maddegedara,Naonori Ueda
DOI: https://doi.org/10.1145/3394277.3401860
2020-06-29
Abstract:This study proposes a fast low-order finite element solver for crustal deformation computations by applying Tensor Core, AI-specific hardware on a Volta GPU. Tensor Core can compute large matrix-matrix multiplications rapidly in half precision. We redesign a state-of-the-art solver algorithm so that lower-precision data types can be used and memory access costs can be reduced even when we use small matrices. With the proposed solver, we solved 13 billion degrees-of-freedom two-layered problems that mimicked the Earth's crust and mantle using 36 compute nodes of Summit. In the matrix-vector kernel, we obtained a 4.1-fold speedup over a standard kernel in a single-precision format. Our proposed solver increased the FLOP count of the entire solver; however, we reduced the time-to-solution by 1.7-fold since the Tensor Core provided a high effective performance.
What problem does this paper attempt to address?