Embracing a new era of highly efficient and productive quantum Monte Carlo simulations

Amrita Mathuriya,Ye Luo,Raymond C. Clay III,Anouar Benali,Luke Shulenburger,Jeongnim Kim

DOI: https://doi.org/10.1145/3126908.3126952

2017-08-09

Abstract:QMCPACK has enabled cutting-edge materials research on supercomputers for over a decade. It scales nearly ideally but has low single-node efficiency due to the physics-based abstractions using array-of-structures objects, causing inefficient vectorization. We present a systematic approach to transform QMCPACK to better exploit the new hardware features of modern CPUs in portable and maintainable ways. We develop miniapps for fast prototyping and optimizations. We implement new containers in structure-of-arrays data layout to facilitate vectorizations by the compilers. Further speedup and smaller memory-footprints are obtained by computing data on the fly with the vectorized routines and expanding single-precision use. All these are seamlessly incorporated in production QMCPACK. We demonstrate upto 4.5x speedups on recent Intel processors and IBM Blue Gene/Q for representative workloads. Energy consumption is reduced significantly commensurate to the speedup factor. Memory-footprints are reduced by up-to 3.8x, opening the possibility to solve much larger problems of future.

Distributed, Parallel, and Cluster Computing

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the efficiency and productivity of quantum Monte Carlo (QMC) simulations on modern high - performance computing (HPC) systems. Specifically, the paper focuses on the performance optimization of the QMCPACK software package. This software package has been successfully used for materials science research on supercomputers for many years, but its single - node efficiency is low, especially in terms of taking advantage of the new hardware features of modern CPUs. The paper proposes a systematic method to transform QMCPACK to better utilize the features of modern CPUs, such as multi - cores, multi - hardware threads and wide SIMD units, etc., so as to achieve higher single - node efficiency and better portability and maintainability. The specific goals of the paper include: 1. **Improve SIMD efficiency**: Improve the vectorization efficiency of key computing kernels by introducing the Structure - of - Arrays (SoA) data type. 2. **Reduce memory footprint**: Reduce memory usage through mixed - precision computing and compute - on - the - fly algorithms. 3. **Maintain code portability and maintainability**: All improvements are based on the C++11 standard and the OpenMP 4 standard, and do not rely on platform - specific optimizations. These improvements not only improve computational efficiency, but also significantly reduce energy consumption and make it possible to solve larger - scale problems. The paper demonstrates the effects of these optimizations through four representative benchmark test systems, including graphite, beryllium - 64, NiO - 32 and NiO - 64, showing a speed - up of up to 4.5 times and a memory footprint reduction of up to 3.8 times. These improvements are of great significance for dealing with more complex and larger - scale materials science problems in the future.

Embracing a new era of highly efficient and productive quantum Monte Carlo simulations

QMCPACK : An open source ab initio Quantum Monte Carlo package for the electronic structure of atoms, molecules, and solids

Software engineering to sustain a high-performance computing scientific application: QMCPACK

A 128-core scalable architecture for Monte Carlo application

Kernel Optimization for Short-Range Molecular Dynamics

QPack: Quantum Approximate Optimization Algorithms as universal benchmark for quantum computers

Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives

Quantum Zeno Monte Carlo for computing observables

Optimizing the Performance of Reactive Molecular Dynamics Simulations for Many-Core Architectures

Pushing Back the Limit of Ab-initio Quantum Transport Simulations on Hybrid Supercomputers

Code modernization strategies for short-range non-bonded molecular dynamics simulations

Robust and effective ab initio molecular dynamics simulations on the GPU cloud infrastructure using the Schrödinger Materials Science Suite

Quantum-enhanced Markov Chain Monte Carlo for systems larger than your Quantum Computer

Phaseless Auxiliary-Field Quantum Monte Carlo on Graphical Processing Units

Scalable fine-grained parallelization of plane-wave-based ab initio molecular dynamics for large supercomputers

A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations

Optimizing the Data Movement in Quantum Transport Simulations via Data-Centric Parallel Programming

Redesigning OpenKMC for Multi-Component Trillion-Atom Simulations on the New Sunway Supercomputer

Quantum-enhanced Markov chain Monte Carlo

Fast and scalable quantum Monte Carlo simulations of electron-phonon models