SeqGO-CPA: Improving Compound-Protein Binding Affinity Prediction with Sequence Information and Gene Ontology Knowledge

Chunyu Wang,Yan Zhu,Naifeng Wen,Lingling Zhao,Junjie Wang
DOI: https://doi.org/10.1109/bibm52615.2021.9669555
2021-01-01
Abstract:The compound-protein binding affinity (CPA) pre-diction is vital for drug discovery and drug repurposing. Deep learning methods have been developed to model the complicated relationship between CPA and the sequences or structures of proteins and molecules. This study proposes a novel deep learning method, SeqGO-CPA, integrating protein function knowledge represented by Gene Ontology (GO) annotations in the CPA prediction. To capture the semantic information of GO annotations, a fine-tuned natural language processing model for biomedical domains is utilized to encode the set of GO terms. Meanwhile, based on the observation that CPA often occurs in sub-structures, our method uses the tokenization algorithm to learn sub-structure information of proteins and compounds from a large number of unlabeled sequences. Further, a deep neural network architecture involving the jointly-feature representation and a highway block is developed to enhance the CPA prediction ability. The proposed model was evaluated on two public benchmark datasets in both standard cross-validation and blinding split settings. The experimental results demonstrate our method outperforms the deep learning-based baselines, meanwhile the incorporating of GO information further improves the prediction performance.
What problem does this paper attempt to address?