Multi-GPU UNRES for scalable coarse-grained simulations of very large protein systems
Krzysztof M. Ocetkiewicz,Cezary Czaplewski,Henryk Krawczyk,Agnieszka G. Lipska,Adam Liwo,Jerzy Proficz,Adam K. Sieradzan,Paweł Czarnul
DOI: https://doi.org/10.1016/j.cpc.2024.109112
IF: 4.717
2024-02-02
Computer Physics Communications
Abstract:Graphical Processor Units (GPUs) are nowadays widely used in all-atom molecular simulations because of the advantage of efficient partitioning of atom pairs between the kernels to compute the contributions to energy and forces, thus enabling the treatment of very large systems. Extension of time- and size-scale of computations is also sought through the development of coarse-grained (CG) models, in which atoms are merged into extended interaction sites. Implementation of CG codes on the GPUs, particularly the multiple-GPU platforms is, however, a challenge due to more complicated potentials and removing the explicit solvent, forcing developers to do interaction- rather than space-domain decomposition. In this paper, we propose a design of a multi-GPU coarse-grained simulator and report the implementation of the heavily coarse-grained physics-based UNited RESidue (UNRES) model of polypeptide chains. By moving all computations to GPUs and keeping the communication with CPUs to a minimum, we managed to achieve almost 5-fold speed-up with 8 A100 GPU accelerators for systems with over 200,000 amino-acid residues, this result making UNRES the best scalable coarse-grained software and enabling us to do laboratory-time millisecond-scale simulations of such cell components as tubulin within days of wall-clock time. Program summary Program Title: Multi-GPU UNRES CPC Library link to program files: https://doi.org/10.17632/hz9s4nwncf.1 Developer's repository link: https://projects.task.gda.pl/eurohpcpl-public/unres Licensing provisions: GPLv3 Programming language: Fortran + C++/CUDA Nature of problem: Physics-based simulations of protein systems at biologically relevant time- and size-scale are demanding and consequently require both the simplification of biomolecule representation and substantial computational resources. UNRES (from UNited RESidue) is a physics-based reduced model of polypeptide chains with which to run large-scale coarse-grained simulations of protein structure and dynamics. It enables the researchers to study protein folding, protein dynamics, and protein-protein interactions in a physically realistic manner and further unveil biological processes' mechanisms. Examples of biological applications include studies of amyloid formations, signaling mechanism, and action of molecular chaperones. Solution method: The presented Multi-GPU UNRES relies on a highly optimized GPU implementation of non-central forces using modern CUDA constructs. Fundamentally, it is possible by proposed efficient partitioning and assignment of the interaction domain onto GPU resources. We moved as many computations as possible to the device (GPU) side. In most cases, computations are defined and scheduled as CUDA graphs. In selected cases, scheduling kernels manually yields slightly better performance. To maximize parallelism, multiple CUDA streams are used. Furthermore, the code visibly benefits from a tree-based allreduce shared-memory-based algorithm. Additionally, if present within hardware, peer memory access is enabled between all GPUs and the allreduce algorithm takes advantage of it. This feature has made the UNRES coarse-grained protein model with implicit solvent scalable for multi-GPUs so that we could achieve almost 5-fold speed-up with 8 A100 GPU accelerators for systems with over 200,000 amino-acid residues. Additional comments including restrictions and unusual features (approx. 50-250 words):
physics, mathematical,computer science, interdisciplinary applications