Running and analyzing massively parallel molecular simulations

Sukrit Singh,Sonya Hanson
DOI: https://doi.org/10.26434/chemrxiv-2024-vwpww
2024-10-15
Abstract:Protein conformational landscapes contain the functionally relevant information useful for understanding biological processes. Mapping out conformational landscapes provides valuable insights into protein behaviors and biological phenomena, and has relevance to therapeutic design. While experimental structural biology (X-ray, Cryo-EM, NMR) can provide high resolution structures, they struggle to provide information about the full conformational landscapes of biomolecules. Molecular dynamics (MD) simulations are a powerful tool for exploring these landscapes at atomic-scale resolution. However, inferring functionally relevant information, such as the full conformational pathway of long-timescale processes, the impact of mutations on binding, or allosteric coupling between residues across long distances, requires too extensive sampling that a single MD simulation may not achieve. This sampling limitation can be circumvented by generating datasets of parallel molecular simulations, a powerful approach to sample long-timescale events and study complex biological phenomena. Here, we discuss recent advances and present a practical guide to generating massively parallel molecular dynamics datasets. We start by detailing the practical considerations prior to generating a dataset, spanning from storage needs to the timescales addressed by the dataset, as well as modern simulation engines. Subsequently, we discuss how to analyze thee datasets to build unified models of conformational space, including future insights to be made possible by distributed simulation architectures.
Chemistry
What problem does this paper attempt to address?