MultiCon: A Semi-Supervised Approach for Predicting Drug Function from Chemical Structure Analysis

Pracheta Sahoo,Indranil Roy,Zhuoyi Wang,Feng Mi,Lin Yu,Pradeep Balasubramani,Latifur Khan,J. Fraser Stoddart
DOI: https://doi.org/10.1021/acs.jcim.0c00801
IF: 6.162
2020-11-03
Journal of Chemical Information and Modeling
Abstract:Semi-supervised learning has proved its efficacy in utilizing extensive unlabeled data to alleviate the use of a large amount of supervised data and improve model performance. Despite its tremendous potential, semi-supervised learning has yet to be implemented in the field of drug discovery. Empirical testing of drugs and their classification is costly and time-consuming. In contrast, predicting therapeutic applications of drugs from their structural formulas using semi-supervised learning would reduce costs and time significantly. Herein, we employ a new multicontrastive-based semi-supervised learning algorithm—MultiCon—for classifying drugs into 12 categories, according to therapeutic applications, on the basis of image analyses of their structural formulas. By rational use of data balancing, online augmentations of the drug image data during training, and the combined use of multicontrastive loss with consistency regularization, MultiCon achieves better class prediction accuracies when compared with the state-of-the-art machine learning methods across a variety of existing semi-supervised learning benchmarks. In particular, it performs exceptionally well with a limited number of labeled examples. For instance, with just 5000 labeled drugs in a PubChem (D<sub>3</sub>) data set, MultiCon achieved a class prediction accuracy of 97.74%.This article has not yet been cited by other publications.
chemistry, multidisciplinary, medicinal,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?