DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Ingoo Lee,Jongsoo Keum,Hojung Nam
DOI: https://doi.org/10.1371/journal.pcbi.1007129
2019-06-14
PLoS Computational Biology
Abstract:Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of <em>in vitro</em> and <em>in vivo</em> experiments have highlighted the importance of <em>in silico</em>-based DTI prediction approaches. In several computational models, conventional protein descriptors have been shown to not be sufficiently informative to predict accurate DTIs. Thus, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs. When we employ a convolutional neural network (CNN) on raw protein sequences, we perform convolution on various lengths of amino acids subsequences to capture local residue patterns of generalized protein classes. We train our model with large-scale DTI information and demonstrate the performance of the proposed model using an independent dataset that is not seen during the training phase. As a result, our model performs better than previous protein descriptor-based models. Also, our model performs better than the recently developed deep learning models for massive prediction of DTIs. By examining pooled convolution results, we confirmed that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches. Our code is available at <a href="https://github.com/GIST-CSBL/DeepConv-DTI">https://github.com/GIST-CSBL/DeepConv-DTI</a>.Drugs work by interacting with target proteins to activate or inhibit a target's biological process. Therefore, identification of DTIs is a crucial step in drug discovery. However, identifying drug candidates via biological assays is very time and cost consuming, which introduces the need for a computational prediction approach for the identification of DTIs. In this work, we constructed a novel DTI prediction model to extract local residue patterns of target protein sequences using a CNN-based deep learning approach. As a result, the detected local features of protein sequences perform better than other protein descriptors for DTI prediction and previous models for predicting PubChem independent test datasets. That is, our approach of capturing local residue patterns with CNN successfully enriches protein features from a raw sequence.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?