Enhancing Scalability of a Matrix-Free Eigensolver for Studying Many-Body Localization

Roel Van Beeumen,Gregory D. Kahanamoku-Meyer,Norman Y. Yao,Chao Yang
DOI: https://doi.org/10.1177/10943420211060365
2022-01-01
The International Journal of High Performance Computing Applications
Abstract:We propose several techniques to enhance the parallel scalability of a matrix-free eigensolver designed for studying many-body localization (MBL) of quantum spin chain models with nearest-neighbor interactions and on-site disorder. This type of problem is computationally challenging because the dimension of the associated Hamiltonian matrix grows exponentially with respect to the number of spins L, and we need to average over different realizations of the random disorder to obtain relevant statistical behavior. For each disorder realization, we need to compute eigenvalues from different regions of the spectrum and their corresponding eigenvectors. In previous work, the interior eigenstates for a single eigenvalue problem are computed via the shift-and-invert Lanczos algorithm. Due to the extremely high memory footprint of the LU factorizations, this technique is not well suited for large L's. For example, we need thousands of compute nodes on modern high performance computing infrastructures to go beyond L = 24. The matrix-free approach does not suffer from this memory bottleneck, however, its scalability is limited by a computation and communication load imbalance. To reduce this imbalance and to significantly enhance the scalability of the matrix-free eigensolver, we reorder the matrix and leverage the consistent space runtime, CSPACER. We also show its efficiency in managing irregular communication patterns at scale compared to optimized MPI non-blocking two-sided and one-sided RMA implementation variants. This effort enables us to study MBL for spin chains with a larger number of spins. The efficiency and effectiveness of the proposed algorithm is demonstrated by computing eigenstates on a massively parallel many-core high performance computer.
What problem does this paper attempt to address?