Abstract:Background Public resources of chemical compound are in a rapid growth both in quantity and the types of data-representation. To comprehensively understand the relationship between the intrinsic features of chemical compounds and protein targets is an essential task to evaluate potential protein-binding function for virtual drug screening. In previous studies, correlations were proposed between bioactivity profiles and target networks, especially when chemical structures were similar. With the lack of effective quantitative methods to uncover such correlation, it is demanding and necessary for us to integrate the information from multiple data sources to produce an comprehensive assessment of the similarity between small molecules, as well as quantitatively uncover the relationship between compounds and their targets by such integrated schema. Results In this study a multi-view based clustering algorithm was introduced to quantitatively integrate compound similarity from both bioactivity profiles and structural fingerprints. Firstly, a hierarchy clustering was performed with the fused similarity on 37 compounds curated from PubChem. Compared to clustering in a single view, the overall common target number within fused classes has been improved by using the integrated similarity, which indicated that the present multi-view based clustering is more efficient by successfully identifying clusters with its members sharing more number of common targets. Analysis in certain classes reveals that mutual complement of the two views for compound description helps to discover missing similar compound when only single view was applied. Then, a large-scale drug virtual screen was performed on 1267 compounds curated from Connectivity Map (CMap) dataset based on the fused similarity, which obtained a better ranking result compared to that of single-view. These comprehensive tests indicated that by combining different data representations; an improved assessment of target-specific compound similarity can be achieved. Conclusions Our study presented an efficient, extendable and quantitative computational model for integration of different compound representations, and expected to provide new clues to improve the virtual drug screening from various pharmacological properties. Scripts, supplementary materials and data used in this study are publicly available at http://lifecenter.sgst.cn/fusion/ .

In-silico Target Prediction by Ensemble Chemogenomic Model Based on Multi-Scale Information of Chemical Structures and Protein Sequences.

Current Advances in Ligand‐based Target Prediction

Theoretical Approaches to the Prediction of the Biological Targets of Small-Molecular Compounds Based on Chemogenomic Information

Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery

Prediction of compound-target interaction using several artificial intelligence algorithms and comparison with a consensus-based strategy

Quantitatively integrating molecular structure and bioactivity profile evidence into drug-target relationship analysis

Global Optimization-Based Inference of Chemogenomic Features from Drug-Target Interactions

In silico prediction of drug-target interaction networks based on drug chemical structure and protein sequences

Prediction of Potential Drug Targets Based on Simple Sequence Properties

Prediction of Bioactive Compound Pathways Using Chemical Interaction and Structural Information.

In-Silico Approaches to Multi-target Drug Discovery

A Simple Way to Incorporate Target Structural Information in Molecular Generative Models

drug-target prediction tool through the integration of chemogenomic data and clustering analysis

Structure-Based de Novo Molecular Generator Combined with Artificial Intelligence and Docking Simulations

An In Silico Method for Predicting Drug Synergy Based on Multitask Learning

Prediction of drug-target interactions for drug repositioning only based on genomic expression similarity

Chemical-Protein Interactome and Its Application in Off-Target Identification

Discovery of Multitarget-Directed Ligands Against Alzheimer'S Disease Through Systematic Prediction of Chemical Protein Interactions

Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem

Multi-task learning models for predicting active compounds

QuoteTarget: A sequence-based transformer protein language model to identify potentially druggable protein targets