Exploring Transition States of Protein Conformational Changes via Out-of-Distribution Detection in the Hyperspherical Latent Space

Xuhui Huang,Bojun Liu,Jordan G. Boysen,Ilona Christy Unarta,Xuefeng Du,Yixuan Li
DOI: https://doi.org/10.26434/chemrxiv-2024-r8gjv
2024-01-23
Abstract:Identifying transitional states is crucial for understanding protein conformational changes that underlie numerous fundamental biological processes. Markov state models (MSMs) constructed from Molecular Dynamics (MD) simulations have demonstrated considerable success in studying protein conformational changes, which are often associated with rare events transiting over free energy barriers. However, it remains challenging for MSMs to identify the transition states, as they group MD conformations into discrete metastable states and do not provide information on transition states lying at the top of free energy barriers between metastable states. Inspired by recent advances in trustworthy artificial intelligence (AI) for detecting out-of-distribution (OOD) data, we present Transition State identification via Dispersion and vAriational principle Regularized neural neTworks (TS-DART). This deep learning approach effectively detects the transition states from MD simulations using hyperspherical embeddings in the latent space. The key insight of TS-DART is to treat the transition state structures as OOD data, recognizing that the transition states are less populated and exhibit a distributional shift from metastable states. Our TS-DART method offers an end-to-end pipeline for identifying transition states from MD simulations. By introducing a dispersion loss function to regularize the hyperspherical latent space, TS-DART can discern transition state conformations that separate multiple metastable states in an MSM. Furthermore, TS-DART provides hyperspherical latent representations that preserve all relevant kinetic geometries of the original dynamics. We demonstrate the power of TS-DART by applying it to a 2D-potential, alanine dipeptide and the translocation of a DNA motor protein on DNA. In all these systems, TS-DART outperforms previous methods in identifying transition states. As TS-DART integrates the dimensionality reduction, state decomposition, and transition state identification in a unified framework, we anticipate that it will be applicable for studying transition states of protein conformational changes.
Chemistry
What problem does this paper attempt to address?
The paper aims to address the problem of identifying transition state structures during the process of protein conformational changes. Specifically, the paper proposes a new method called TS-DART (Transition State identification via Dispersion and vAriational principle Regularized neural neTworks), which utilizes deep learning techniques to detect transition states from molecular dynamics (MD) simulations. ### Main Issues: 1. **Transition State Identification Challenge**: Although Markov State Models (MSM) have achieved significant success in studying protein conformational changes, they still face difficulties in identifying transition states. This is because MSM classifies MD conformations into discrete metastable states and cannot provide information about transition states located at the top of the free energy barrier. 2. **Low Population Density and Distribution Shift**: Transition state structures are usually sparse and distributed at the top of the free energy barrier, exhibiting distribution characteristics different from metastable states. ### Solution: - **Deep Learning Framework**: TS-DART effectively identifies transition states by introducing a dispersion loss function to regularize hyperspherical embeddings in high-dimensional space. - **Hyperspherical Embedding Representation**: TS-DART uses the penultimate layer of deep neural networks as the hyperspherical embedding representation of biomolecular conformations and further regularizes these embeddings by jointly optimizing VAMP-2 loss and dispersion loss. - **Automatic Identification**: By calculating the cosine similarity between hyperspherical embeddings and metastable state centers, all transition state conformations located between free energy barriers can be automatically identified. ### Method Advantages: - **End-to-End Framework**: TS-DART integrates dimensionality reduction, state decomposition, and transition state identification, providing a unified framework. - **Superior Performance**: TS-DART outperforms previous methods on different systems, such as 2D potential, alanine dipeptide, and DNA motor protein translocation along double-stranded DNA. Through this method, researchers can better understand the key transition states in the process of protein conformational changes, which is of great significance for fields such as drug design and enzyme engineering.