Precise Generation of Conformational Ensembles for Intrinsically Disordered Proteins via Fine-tuned Diffusion Models

Junjie Zhu,Zhengxin Li,Bo Zhang,Zhuoqi Zheng,Bozitao Zhong,Jie Bai,Xiaokun Hong,Taifeng Wang,Ting Wei,Jianyi Yang,Hai-Feng Chen
DOI: https://doi.org/10.1101/2024.05.05.592611
2024-09-13
Abstract:Intrinsically disordered proteins (IDPs) play pivotal roles in various biological functions and are closely linked to many human diseases including cancer, diabetes and Alzheimer disease. Structural investigations of IDPs typically involve a combination of molecular dynamics (MD) simulations and experimental data to correct for intrinsic biases in simulation methods. However, these simulations are hindered by their high computational cost and a scarcity of experimental data, severely limiting their applicability. Despite the recent advancements in structure prediction for structured proteins, understanding the conformational properties of IDPs remains challenging partly due to the poor conservation of disordered protein sequences and limited experimental characterization. Here, we introduce IDPFold, a method capable of generating conformational ensembles for IDPs directly from their sequences using fine-tuned diffusion models. IDPFold bypasses the need for Multiple Sequence Alignments (MSA) or experimental data, achieving accurate predictions of ensemble properties across numerous IDPs. By sampling conformations at the backbone level, IDPFold provides more detailed structural features and more precise property estimation compared to other state-of-the-art methods. IDPFold is ready to be used in the elucidate the sequence-disorder-function paradigm of IDPs.
Bioinformatics
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to accurately generate conformational ensembles of Intrinsically Disordered Proteins (IDPs). IDPs play a key role in various biological functions and are closely associated with many human diseases, such as cancer, diabetes, and Alzheimer's disease. However, due to the instability of IDP structures, existing experimental methods and molecular dynamics (MD) simulation methods face challenges of high computational cost and scarcity of experimental data when generating their conformational ensembles. Despite significant progress in structural prediction of structured proteins in recent years, understanding the conformational properties of IDPs remains challenging, partly due to the poor conservation of disordered protein sequences and insufficient experimental characterization. To this end, the authors propose IDPFold, a method based on fine-tuning diffusion models, capable of directly generating conformational ensembles from the sequences of IDPs. IDPFold does not require multiple sequence alignment (MSA) or experimental data, and by sampling backbone-level conformations, it provides more detailed structural features and more accurate property estimates, thereby overcoming the limitations of existing methods.