Machine learning approaches in predicting allosteric sites

Francho Nerín-Fonz,Zoe Cournia
DOI: https://doi.org/10.26434/chemrxiv-2024-4gf9b
2024-01-17
Abstract:Allosteric regulation is a fundamental biological mechanism that can control critical cellular processes via allosteric modulator binding to protein distal functional sites. The advantages of allosteric modulators over orthosteric ones have sparked the development of numerous computational approaches, such as the identification of allosteric binding sites, to facilitate allosteric drug discovery. Building on the success of Machine Learning (ML) models for solving complex problems in biology and chemistry, several ML models for predicting allosteric sites have been developed. In this review, we provide an overview of these models and discuss future perspectives powered by the field of Artificial Intelligence such as protein Language Models.
Chemistry
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to use machine - learning methods to predict allosteric sites of proteins**. Specifically, the article explores identifying possible allosteric sites in proteins through machine - learning (ML) models to assist in the design and discovery of allosteric drugs. Allosteric regulation is an important biological mechanism, which can control key cellular processes by allosteric modulators binding to the functional sites of proteins. Compared with orthosteric sites, allosteric sites have unique advantages, such as higher selectivity and lower toxicity, and thus become important targets in drug design. The article points out that traditional experimental methods for studying allosteric sites are very challenging and often rely on accidental discovery. To overcome these difficulties, researchers have developed a variety of computational methods, especially machine - learning models, for predicting potential allosteric sites. These models are trained based on existing protein - structure data and allosteric - site annotation information, and use features such as physicochemical properties and network analysis to improve prediction accuracy. In summary, the main purposes of this paper are: 1. **To review the progress of existing machine - learning models in predicting allosteric sites**. 2. **To discuss the advantages and limitations of these models**. 3. **To look forward to future research directions**, especially the application prospects of new tools in the field of artificial intelligence such as protein language models (pLMs). Through these efforts, researchers hope to discover new allosteric sites more efficiently, thereby accelerating the research and development process of allosteric drugs.