Prediction of DTIs for High-Dimensional and Class-Imbalanced Data Based on CGAN.

Kang Yang,Zhongnan Zhang,Song He,Xiaochen Bo
DOI: https://doi.org/10.1109/bibm.2018.8621098
2018-01-01
Abstract:Drug-target interactions (DTIs) are an important issue in new drug discovery and drug repositioning techniques. Currently. However, due to the high degree of imbalance and high-dimensional nature of datasets in the field, the design of effective predictive methods faces challenges. In this study, we model DTI prediction as a binary classification problem. First, original positive and negative sample sets are constructed using the databases of LINCS and Drugbank. Then the CGAN model is applied for over-sampling the original positive samples, so that the proportion between the positive and negative samples will be balanced. The above sampled class-balanced samples are used to train the classifier, and finally, the class-unknown samples are used for DTI prediction. In the experimental section, the necessity of over-sampling is demonstrated. Then, comparisons of different over-samplers showed that the CGAN over-sampler had obvious advantages over traditional samplers. Therefore, for high-dimensional and class-imbalanced datasets, CGAN over-sampling is more applicable for DTI prediction than traditional methods.
What problem does this paper attempt to address?