Predicting Drug–Target Interactions Based on the Ensemble Models of Multiple Feature Pairs

Cheng Wang,Jun Zhang,Peng Chen,Bing Wang
DOI: https://doi.org/10.3390/ijms22126598
IF: 5.6
2021-06-20
International Journal of Molecular Sciences
Abstract:Backgroud: The prediction of drug–target interactions (DTIs) is of great significance in drug development. It is time-consuming and expensive in traditional experimental methods. Machine learning can reduce the cost of prediction and is limited by the characteristics of imbalanced datasets and problems of essential feature selection. Methods: The prediction method based on the Ensemble model of Multiple Feature Pairs (Ensemble-MFP) is introduced. Firstly, three negative sets are generated according to the Euclidean distance of three feature pairs. Then, the negative samples of the validation set/test set are randomly selected from the union set of the three negative sets in the validation set/test set. At the same time, the ensemble model with weight is optimized and applied to the test set. Results: The area under the receiver operating characteristic curve (area under ROC, AUC) in three out of four sub-datasets in gold standard datasets was more than 94.0% in the prediction of new drugs. The effectiveness of the proposed method is also shown with the comparison of state-of-the-art methods and demonstration of predicted drug–target pairs. Conclusion: The Ensemble-MFP can weigh the existing feature pairs and has a good prediction effect for general prediction on new drugs.
biochemistry & molecular biology,chemistry, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict drug - target interactions (DTIs) in drug development. Traditional experimental methods are time - consuming and costly, while machine - learning methods can reduce the prediction cost but are limited by the problem of unbalanced data sets and the difficulty of key feature selection. Specifically, the paper aims to improve the prediction effect of new drugs through an ensemble model based on multiple feature pairs (Ensemble - MFP), while solving the problems of negative sample generation and feature - pair selection. The methods proposed in the paper mainly include the following aspects: 1. **Negative sample generation**: Generate three negative sample sets according to the Euclidean distances of three feature pairs, and randomly select negative samples for the validation set and the test set from them to improve the reliability of negative samples. 2. **Ensemble model construction**: Train three sub - models using three different feature pairs, and combine these sub - models by optimizing the weights to form the final ensemble model. 3. **Data set partitioning**: Use 5 - fold cross - validation to divide drugs proportionally into training sets, validation sets and test sets to ensure the generalization ability of the model for new drugs. The main contributions of the paper are: - Proposing an ensemble model method based on multiple feature pairs, which effectively solves the problem of unbalanced data sets. - Improving the prediction effect by optimizing the model weights, especially performing well in the prediction of new drugs. - Compared with existing methods, this method has superior performance on multiple benchmark data sets, especially on GPCR and ion channel data sets. In general, this paper provides a new and effective solution for the prediction of drug - target interactions, which helps to accelerate the drug development process.