A robust drug representation learning model for eliminating cell specificity in gene expression profile and its application.

Cecheng Zhao,Ziyang Huang,Hui Wang,Haitao Fu,Dong Wang,Yingjie Gao,Haotian Zhu,Xiaohui Niu,Wen Zhang
DOI: https://doi.org/10.1109/BIBM52615.2021.9669385
2021-01-01
Abstract:Learning high-quality drug representations is important for drug development and the understanding of drug action mechanisms. Leveraging the gene expression profile of drug treated cells and eliminating cell specificity can facilitate drug representation learning. In this paper, we propose a four stage deep learning model that aims for drug representation learning based on integrating gene expression profile and the therapeutic use information of drugs, abbreviated as “DGERN”. The stacked autoencoder module is employed for data dimension reduction; the iterative clustering module is used to eliminate cell specificity; the subclass pre-training module and the label classifier module are utilized to integrate the therapeutic use information of drugs into drug representations. Visualization of the drug representations proves that DGERN eliminates cell specificity and integrates the therapeutic use information of drugs effectively. The drug representations learned by DGERN are used in the subsequent and prediction tasks of drug development. In the task of predicting drug-disease associations, DGERN combined with random forest achieves the best performance reaching 0.67 on AUC, exceeding 0.60 of the second-placed one; in the drug-drug interaction prediction task, DGERN combined with random forest gets 0.73 on AUC, which is second in comparison with other drug representations.
What problem does this paper attempt to address?