Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

Jiarui Lu,Bozitao Zhong,Zuobai Zhang,Jian Tang
2024-03-12
Abstract:The dynamic nature of proteins is crucial for determining their biological functions and properties, for which Monte Carlo (MC) and molecular dynamics (MD) simulations stand as predominant tools to study such phenomena. By utilizing empirically derived force fields, MC or MD simulations explore the conformational space through numerically evolving the system via Markov chain or Newtonian mechanics. However, the high-energy barrier of the force fields can hamper the exploration of both methods by the rare event, resulting in inadequately sampled ensemble without exhaustive running. Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training, which suffers from high data acquisition cost and poor generalizability. Inspired by simulated annealing, we propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling with roto-translation equivariant property. Our method leverages an amortized denoising score matching objective trained on general crystal structures and has no reliance on simulation data during both training and inference. Experimental results across several benchmarking protein systems demonstrate that Str2Str outperforms previous state-of-the-art generative structure prediction models and can be orders of magnitude faster compared to long MD simulations. Our open-source implementation is available at <a class="link-external link-https" href="https://github.com/lujiarui/Str2Str" rel="external noopener nofollow">this https URL</a>
Quantitative Methods,Machine Learning,Biomolecules
What problem does this paper attempt to address?
The main focus of this paper is the problem of protein conformation sampling, which is crucial for understanding the biological function and properties of proteins. Current methods, such as Monte Carlo (MC) and Molecular Dynamics (MD) simulations, are widely used but limited by high energy barriers, which may result in insufficient sampling. Learning-based methods directly sample but require simulation data specific to a target, with high training cost and poor generalizability. The paper proposes a new framework called STR2STR, which is a score-based structure-to-structure translation method, capable of zero-shot protein conformation sampling with rotational and translational invariance. This method utilizes denoised score matching objectives on general crystal structures for training, without relying on simulation data during training and inference. Experimental results demonstrate that STR2STR outperforms previous generative structure prediction models on multiple benchmark protein systems and is orders of magnitude faster than long MD simulations. Inspired by simulated annealing, STR2STR forms a forward-backward process through random perturbations and score-guided annealing, ensuring invariance to global rotation and translation and guaranteeing that inferred samples are not merely rotational or translational variants. Through this approach, the paper addresses the difficulties and inefficiencies in data acquisition in traditional methods and provides a new efficient tool for protein conformation research.