Automated Collective Variable Discovery for MFSD2A transporter from molecular dynamics simulations

Myongin Oh,Margarida Rosa,Hengyi Xie,George Khelashvili
DOI: https://doi.org/10.1101/2024.04.19.590308
2024-04-25
Abstract:Biomolecules often exhibit complex free energy landscapes in which long-lived metastable states are separated by large energy barriers. Overcoming these barriers to robustly sample transitions between the metastable states with classical molecular dynamics (MD) simulations presents a challenge. To circumvent this issue, collective variable (CV)-based enhanced sampling MD approaches are often employed. Traditional CV selection relies on intuition and prior knowledge of the system. This approach introduces bias, which can lead to incomplete mechanistic insights. Thus, automated CV detection is desired to gain a deeper understanding of the system/process. Analysis of MD data with various machine learning algorithms, such as Principal Component Analysis (PCA), Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA)-based approaches have been implemented for automated CV detection. However, their performance has not been systematically evaluated on structurally and mechanistically complex biological systems. Here, we applied these methods to MD simulations of the MFSD2A (Major Facilitator Superfamily Domain 2A) lysolipid transporter in multiple functionally relevant metastable states with the goal of identifying optimal CVs that would structurally discriminate these states. Specific emphasis was on the automated detection and interpretive power of LDA-based CVs. We found that LDA methods, which included a novel gradient descent-based multiclass harmonic variant, termed GDHLDA, we developed here, outperform PCA in class separation, exhibiting remarkable consistency in extracting CVs critical for distinguishing metastable states. Furthermore, the identified CVs included features previously associated with conformational transitions in MFSD2A. Specifically, conformational shifts in transmembrane helix 7 and in residue Y294 on this helix emerged as critical features discriminating the metastable states in MFSD2A. This highlights the effectiveness of LDA-based approaches in automatically extracting from MD trajectories CVs of functional relevance that can be used to drive biased MD simulations to efficiently sample conformational transitions in the molecular system.
Biophysics
What problem does this paper attempt to address?
This paper aims to address the challenge of efficiently sampling complex biological molecular systems in molecular dynamics (MD) simulations, particularly in the presence of long-lived metastable states and high energy barriers. Traditionally, the efficiency of MD simulations is enhanced by selecting collective variables (CVs) based on intuition and prior knowledge, but this approach may introduce bias and limit a comprehensive understanding of the system mechanisms. Therefore, the paper proposes an automated method for discovering collective variables to gain a deeper understanding of the dynamic processes of the system. Specifically, the researchers applied various machine learning algorithms, including Principal Component Analysis (PCA), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), etc., to analyze the MD simulation data of MFSD2A transporter protein in multiple functionally relevant metastable states. The goal is to identify the optimal collective variables that can structurally differentiate these states. The study particularly emphasizes the automated detection and interpretability of collective variables based on LDA. They developed a new method called GDHLDA, which is a gradient descent-based multi-class variant of harmonic analysis. The results show that the LDA method outperforms PCA in class separation and consistently extracts key features for distinguishing different metastable states of MFSD2A. The focus of the paper is to demonstrate the effectiveness of the LDA-based approach in automatically extracting functionally relevant collective variables from MD trajectories, which can drive biased MD simulations for efficient sampling of conformational transitions in molecular systems. Specifically, the paper reveals the conformational changes of transmembrane helix 7 (TM7) and the Y294 residue within it as key features for distinguishing metastable states of MFSD2A. This suggests that the LDA-based approach has significant advantages in automatically extracting collective variables, which can contribute to the future design of enhanced sampling MD simulations to elucidate detailed molecular mechanisms of state transitions in MFSD2A.