Direct prediction of intrinsically disordered protein conformational properties from sequences

Jeffrey M. Lotthammer,Garrett M. Ginell,Daniel Griffith,Ryan J. Emenecker,Alex S. Holehouse
DOI: https://doi.org/10.1038/s41592-023-02159-5
IF: 48
2024-02-02
Nature Methods
Abstract:Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations and deep learning to develop ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale. ALBATROSS is lightweight, easy to use and accessible as both a locally installable software package and a point-and-click-style interface via Google Colab notebooks. We first demonstrate the applicability of our predictors by examining the generalizability of sequence–ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize the sequence-specific biophysical behavior of IDRs within and between proteomes.
biochemical research methods
What problem does this paper attempt to address?
The problem this paper attempts to address is: **How to directly predict the ensemble conformational properties of intrinsically disordered regions (IDRs) from protein sequences**. Specifically, IDRs are ubiquitous across all domains of life and play diverse roles in molecular and cellular functions. However, due to the high conformational heterogeneity of IDRs, they are almost absent in the Protein Data Bank (PDB), leading to a lack of effective computational methods to predict their ensemble conformational properties from sequences. Therefore, the authors combined rational sequence design, large-scale molecular simulations, and deep learning to develop a deep learning model named ALBATROSS, which can directly predict the ensemble dimensions of IDRs from sequences, including properties such as radius of gyration, end-to-end distance, polymer scaling exponent, and ensemble asphericity. These predictions can be made on a proteome-wide scale and are easily accessible through a locally installed software package or a Google Colab notebook.