Interpretable PROTAC degradation prediction with structure-informed deep ternary attention framework
Zhenglu Chen,Chunbin Gu,Shuoyan Tan,Xiaorui Wang,Yuquan Li,Mutian He,Ruiqiang Lu,Shijia Sun,Chang-Yu Hsieh,Xiaojun Yao,Huanxiang Liu,Pheng-Ann Heng
DOI: https://doi.org/10.1101/2024.11.05.622005
2024-11-08
Abstract:Proteolysis Targeting Chimeras (PROTACs) are heterobifunctional ligands that form ternary complexes with Protein Of Interests (POIs) and E3 ligases, exploiting the ubiquitin-proteasome system to degrade disease-causing proteins, promising to drug the undruggable. While PROTAC research primarily relies on costly and time-consuming wet experimental approaches, deep learning offers a promising avenue to accelerate development and reduce expenses. However, existing deep learning methods for PROTAC degradation prediction often overlook the significance of hierarchical molecular representation and protein structural information, hindering effective data modeling. Moreover, their black-box nature limits the interpretability of computational outcomes, failing to provide intuitive insights into substructure interactions within the PROTAC system. This study introduces PROTAC-STAN, a structure-informed deep ternary attention network (STAN) framework for interpretable PROTAC degradation prediction. PROTAC-STAN represents PROTAC molecules across atom, molecule, and property hierarchies and incorporates structure information for POIs and E3 ligases using a protein language model infused with structural data. Furthermore, it simulates interactions among three entities at the atom and amino acid levels via a novel ternary attention network tailored for the PROTAC system, enhancing interpretability. By integrating hierarchical PROTAC molecule representation, structural embedding of POI and E3 ligase, and ternary attention network modeling interactions, our approach substantially improves prediction accuracy by 10.95% while enabling significant model interpretability via atomic and residue level visualization of molecule and complex. Experiments on the refined public PROTAC dataset demonstrate that PROTAC-STAN outperforms state-of-the-art baselines in overall performance. The excellent performance of PROTAC-STAN is anticipated to establish it as a foundational tool in future research on PROTAC-related drugs, thereby accelerating the development of this field.
Bioinformatics