Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo

T. McDaniel,E. F. D'Azevedo,Y. W. Li,K. Wong,P. R. C. Kent
DOI: https://doi.org/10.1063/1.4998616
2017-08-02
Abstract:Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is therefore formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core CPUs and GPUs.
Computational Physics,Materials Science
What problem does this paper attempt to address?
This paper attempts to solve the problem of high computational cost in calculating Slater determinants for large systems in ab initio Quantum Monte Carlo (QMC) simulations. Specifically: 1. **Computational Bottleneck**: In QMC simulations, the value of the Slater determinant needs to be calculated at each Monte Carlo step, which usually involves solving the determinant of a dense matrix. The traditional approach is to use the rank - 1 Sherman - Morrison update scheme to avoid repeated explicit calculation of the inverse matrix, but the time complexity of this method is \(O(N^3)\). For large systems (such as solid materials with a large number of electrons), the computational overhead is very large. 2. **Proposed New Algorithm**: To improve numerical efficiency, the authors propose a new multi - rank delayed update scheme. This scheme converts matrix - vector operations into more efficient matrix - matrix operations by delaying the application of accepted moves until a predetermined number \(K\) is reached and then updating the matrix in batches. This not only improves the arithmetic intensity and computational efficiency but also keeps the original Monte Carlo sampling and its statistical efficiency unchanged. 3. **Applicable Scenarios**: This new algorithm is particularly suitable for QMC algorithms with a high acceptance rate, such as Diffusion Monte Carlo (DMC), because in this case, the delayed update can significantly reduce the update time. 4. **Performance Improvement**: Experimental results show that on multi - core CPUs and GPUs, for matrices of approximately size 1000, the new algorithm can achieve an order - of - magnitude acceleration. As the matrix size increases, the acceleration effect is more pronounced. In summary, this paper aims to significantly reduce the computational cost in QMC simulations by improving the Slater determinant update algorithm, and in particular, provides an effective method for efficient simulation of large systems.