Abstract:Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules. They induce the degradation of a target protein by recruiting an E3 ligase to the target. The PROTAC can inactivate disease-related genes that are considered as understudied, thus has a great potential to be a new type of therapy for the treatment of incurable diseases. However, only hundreds of proteins have been experimentally tested if they are amenable to the PROTACs. It remains elusive what other proteins can be targeted by the PROTAC in the entire human genome. For the first time, we have developed an interpretable machine learning model PrePROTAC, which is based on a transformer-based protein sequence descriptor and random forest classification to predict genome-wide PROTAC-induced targets degradable by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved ROC-AUC of 0.81, PR-AUC of 0.84, and over 40% sensitivity at a false positive rate of 0.05, respectively. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method to identify positions in the protein structure, which play key roles in the PROTAC activity. The key residues identified were consistent with our existing knowledge. We applied PrePROTAC to identify more than 600 novel understudied proteins that are potentially degradable by CRBN, and proposed PROTAC compounds for three novel drug targets associated with Alzheimer's disease. Author summary: Many human diseases remain incurable because disease-causing genes cannot by selectively and effectively targeted by small molecules. Proteolysis-targeting chimera (PROTAC), an organic compound that binds to both a target and a degradation-mediating E3 ligase, has emerged as a promising approach to selectively target disease-driving genes that are not druggable by small molecules. Nevertheless, not all of proteins can be accommodated by E3 ligases, and be effectively degraded. Knowledge on the degradability of a protein will be crucial for the design of PROTACs. However, only hundreds of proteins have been experimentally tested if they are amenable to the PROTACs. It remains elusive what other proteins can be targeted by the PROTAC in the entire human genome. In this paper, we propose an intepretable machine learning model PrePROTAC that takes advantage of powerful protein language modeling. PrePROTAC achieves high accuracy when evaluated by an external dataset which comes from different gene families from the proteins in the training data, suggesting the generalizability of PrePROTAC. We apply PrePROTAC to the human genome, and identify more than 600 understudied proteins that are potentially responsive to the PROTAC. Furthermore, we design three PROTAC compounds for novel drug targets associated with Alzheimer's disease.

Modeling PROTAC Degradation Activity with Machine Learning

PROTACs: an Emerging Targeting Technique for Protein Degradation in Drug Discovery

DeepPROTACs is a deep learning-based targeted degradation predictor for PROTACs

Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning

Predicting Degradation Potential of Protein Targeting Chimeras

Interpretable PROTAC degradation prediction with structure-informed deep ternary attention framework

De novo PROTAC design using graph-based deep generative models

Accelerated Rational PROTAC Design Via Deep Learning and Molecular Simulations

Improved Accuracy for Modeling PROTAC-Mediated Ternary Complex Formation and Targeted Protein Degradation via New In Silico Methodologies.

AI-DPAPT: a machine learning framework for predicting PROTAC activity

The Present and Future of Novel Protein Degradation Technology

Integrative Modeling of PROTAC-Mediated Ternary Complexes

Homobivalent, Trivalent, and Covalent PROTACs: Emerging Strategies for Protein Degradation

Recent Advances of Degradation Technologies Based on PROTAC Mechanism

Importance of Three-Body Problems and Protein-Protein Interactions in Proteolysis-Targeting Chimera Modeling: Insights from Molecular Dynamics Simulations

In Vivo Target Protein Degradation Induced by PROTACs Based on E3 Ligase DCAF15

PROTACable Is an Integrative Computational Pipeline of 3-D Modeling and Deep Learning To Automate the De Novo Design of PROTACs

[Induced degradation of proteins by PROTACs and other strategies: towards promising drugs]

A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design

PROTACs: Great Opportunities for Academia and Industry (an Update from 2020 to 2021).

From Conception to Development: Investigating PROTACs Features for Improved Cell Permeability and Successful Protein Degradation