Phaseless Auxiliary-Field Quantum Monte Carlo on Graphical Processing Units

James Shee,Evan J. Arthur,Shiwei Zhang,David R. Reichman,Richard A. Friesner
DOI: https://doi.org/10.1021/acs.jctc.8b00342
2018-04-10
Abstract:We present an implementation of phaseless Auxiliary-Field Quantum Monte Carlo (ph-AFQMC) utilizing graphical processing units (GPUs). The AFQMC method is recast in terms of matrix operations which are spread across thousands of processing cores and are executed in batches using custom Compute Unified Device Architecture kernels and the hardware-optimized cuBLAS matrix library. Algorithmic advances include a batched Sherman-Morrison-Woodbury algorithm to quickly update matrix determinants and inverses, density-fitting of the two-electron integrals, an energy algorithm involving a high-dimensional precomputed tensor, and the use of single-precision floating point arithmetic. These strategies result in dramatic reductions in wall-times for both single- and multi-determinant trial wavefunctions. For typical calculations we find speed-ups of roughly two orders of magnitude using just a single GPU card. Furthermore, we achieve near-unity parallel efficiency using 8 GPU cards on a single node, and can reach moderate system sizes via a local memory-slicing approach. We illustrate the robustness of our implementation on hydrogen chains of increasing length, and through the calculation of all-electron ionization potentials of the first-row transition metal atoms. We compare long imaginary-time calculations utilizing a population control algorithm with our previously published correlated sampling approach, and show that the latter improves not only the efficiency but also the accuracy of the computed ionization potentials. Taken together, the GPU implementation combined with correlated sampling provides a compelling computational method that will broaden the application of ph-AFQMC to the description of realistic correlated electronic systems.
Computational Physics,Strongly Correlated Electrons
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to optimize the phase - unconstrained auxiliary - field quantum Monte Carlo (ph - AFQMC) method by leveraging the graphics processing unit (GPU) to significantly improve computational efficiency and expand its application range. Specifically, the authors redesigned the matrix operations in the AFQMC method so that they can be executed in parallel on thousands of processing cores, and used custom CUDA kernels and the hardware - optimized cuBLAS matrix library to achieve this goal. In addition, they introduced a series of algorithm improvements, including the batch Sherman - Morrison - Woodbury algorithm for fast updating of matrix determinants and inverse matrices, density fitting of two - electron integrals, an energy algorithm involving high - dimensional pre - computed tensors, and the use of single - precision floating - point arithmetic. These strategies work together to significantly reduce the computation time for both single - determinant and multi - determinant trial wave functions. For typical computations, a speed - up of approximately two orders of magnitude can be achieved using only a single GPU card. Moreover, using 8 GPU cards on a single node can achieve near - uniform parallel efficiency, and medium - sized systems can be processed through the local - memory - slicing method. The robustness of the implementation was demonstrated through calculations with increasing hydrogen - chain lengths and calculations of the all - electron ionization potentials of first - row transition - metal atoms. Compared with long imaginary - time calculations using the population - control algorithm, their previously published related - sampling method not only improves efficiency but also improves the accuracy of calculating ionization potentials. Overall, the GPU implementation combined with related - sampling provides a compelling computational method that will expand the application of ph - AFQMC in describing real - world correlated - electron systems.