An alternative GPU acceleration for a pseudopotential plane-waves density functional theory code with applications to metallic systems

Xuejun Gong,Andrea Dal Corso
DOI: https://doi.org/10.1016/j.cpc.2024.109439
2024-12-03
Abstract:We present an alternative GPU acceleration for plane waves pseudopotentials electronic structure codes designed for systems that have small unit cells but require a large number of k points to sample the Brillouin zone as happens, for instance, in metals. We discuss the diagonalization of the Kohn and Sham equations and the solution of the linear system derived in density functional perturbation theory. Both problems take advantage from a rewriting of the routine that applies the Hamiltonian to the Bloch wave-functions to work simultaneously (in parallel on the GPU threads) on the wave-functions with different wave-vectors k, as many as allowed by the GPU memory. Our implementation is written in CUDA Fortran and makes extensive use of kernel routines that run on the GPU (GLOBAL routines) or can be called from inside the GPU threads (DEVICE routines). We compare our method with the CPUs only calculation and with the approach currently implemented in Quantum ESPRESSO that uses GPU accelerated libraries for the FFT and for the linear algebra tasks such as the matrix-matrix multiplications as well as OpenACC directives for loop parallelization. We show in a realistic example that our method can give a significant improvement in the cases for which it has been designed.
Materials Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that when dealing with metal systems with small unit cells but requiring a large number of k - points to sample the Brillouin zone, the existing plane - wave pseudopotential electronic structure codes based on GPU acceleration are not highly efficient. Specifically, for small - sized systems, using current GPU - acceleration methods can sometimes even be slower than using only the CPU. This is because the test systems are usually large supercells containing many atoms, in which the time for performing calculations on the GPU is greater than the time required to transfer data from the CPU to the GPU, and small - sized systems are ignored in this case. To improve this situation, the author proposes a new GPU - acceleration method, which is especially suitable for dealing with metal systems with a large number of k - points. This method loads as many wave functions (i.e., k - points) as possible in the GPU memory and simultaneously performs calculations on all these data (such as applying the Hamiltonian to the wave functions), with each GPU thread processing one wave function or part of it. This can significantly increase the speed of calculations for small - sized systems.