PDFll: Predictors of Disorder and Function of Proteins from the Language of Life

Wanyi Yang,Qingsong Du,Xunyu Zhou,Chuanfang Wu,Jinku Bao
DOI: https://doi.org/10.1089/cmb.2024.0506
2024-09-09
Abstract:The identification of intrinsically disordered proteins and their functional roles is largely dependent on the performance of computational predictors, necessitating a high standard of accuracy in these tools. In this context, we introduce a novel series of computational predictors, termed PDFll (Predictors of Disorder and Function of proteins from the Language of Life), which are designed to offer precise predictions of protein disorder and associated functional roles based on protein sequences. PDFll is developed through a two-step process. Initially, it leverages large-scale protein language models (pLMs), trained on an extensive dataset comprising billions of protein sequences. Subsequently, the embeddings derived from pLMs are integrated into streamlined, yet sophisticated, deep-learning models to generate predictions. These predictions notably surpass the performance of existing state-of-the-art predictors, particularly those that forecast disorder and function without utilizing evolutionary information.
What problem does this paper attempt to address?