Protein Retrieval via Integrative Molecular Ensembles (PRIME) through extended similarity indices
Lexin Chen,Arup Mondal,Alberto Perez,Ramón Alain Miranda-Quintana
DOI: https://doi.org/10.1101/2024.03.19.585783
2024-03-21
Abstract:Molecular dynamics (MD) simulations are ideally suited to describe conformational ensembles of biomolecules such as proteins and nucleic acids. Microsecond-long simulations are now routine, facilitated by the emergence of graphical processing units. Processing such ensembles on the basis of statistical mechanics can bring insights about different biologically relevant states, their representative structures, states, and even dynamics between states. Clustering, which groups objects based on structural similarity, is typically used to process ensembles, leading to different states, their populations, and the identification of representative structures. For some purposes, such as in protein structure prediction, we are interested in identifying the representative structure that is more similar to the native state of the protein. The traditional pipeline combines hierarchical clustering for clustering and selecting the cluster centroid as representative of the cluster. However, even when the first cluster represents the native basin, the centroid can be several angstroms away in RMSD from the native state – and many other structures inside this cluster could be better choices of representative structures, reducing the need for protein structure refinement. In this study, we developed a module—Protein Retrieval via Integrative Molecular Ensemble (PRIME), that consists of tools to determine the most prevalent states in an ensemble using extended continuous similarity. PRIME is integrated with our Molecular Dynamics Analysis with -ary Clustering Ensembles (MDANCE) package and can be used as a post-processing tool for arbitrary clustering algorithms, compatible with several MD suites. PRIME was validated with ensembles of different protein and protein complex systems for their ability to reliably identify the most native-like state, which we compare to their experimental structure, and to the traditional approach. Systems were chosen to represent different degrees of difficulty such as folding processes and binding which require large conformational changes. PRIME predictions produced structures that when aligned to the experimental structure were better superposed (lower RMSD). A further benefit of PRIME is its linear scaling – rather than the traditional O( ) traditionally associated to comparisons of elements in a set.
Biophysics