Abstract:Identifying how a given chemical of interest exerts its impact on biological systems is a critical step in developing new medicines and chemical products. The mechanism of a query compound of interest can sometimes be identified when its image-based morphological profile matches a compound in a library of well-annotated compound profiles. In this study, we demonstrate a significant improvement in classification performance by incorporating side information: gene representations. We generate these representations using the morphological profiles of cells where the level of a single gene's expression has been artificially increased or decreased. The genes are selected as those encoding known protein targets of annotated compounds in the library. A transformer model is trained to classify gene-compound pairs, where each pair represents a potential interaction between a gene and a compound, as true or false. Subsequently, the model generates a ranked list of likely target genes for a previously unseen query compound. Although the strategy exhibits high performance only for compounds that target previously encountered genes - likely due to the limited size of our training dataset - the performance increase demonstrates a notable improvement over simply matching compound profiles directly to compound profiles or to gene profiles. Larger datasets may improve the prediction capabilities of this approach, enabling the prediction of gene targets for novel compounds, which can then be experimentally validated.

What problem does this paper attempt to address?

The paper aims to address a key issue in the drug discovery process: how to efficiently identify potential protein targets for candidate drugs. Specifically, the research focuses on using cellular morphological phenotype information to enhance the accuracy of drug target prediction. The research background includes: - The current cost of drug development is high and continues to rise. - With advances in genomics, the discovery of new diseases and their subtypes is accelerating, making it difficult for traditional drug screening methods to keep pace. - Existing target deconvolution techniques are often expensive, time-consuming, and uncertain in their results. To address these issues, the authors propose a machine learning-based approach that utilizes high-dimensional data generated by cell imaging technologies (such as Cell Painting) to train models to predict the interactions between compounds and specific genes. Specifically, the key innovation of this method lies in the integration of the morphological phenotype information of genes, that is, the cellular morphological changes produced by increasing or decreasing the expression level of a single gene, and using these changes as a feature representation of the genes. The main contributions of the study include: 1. **Model Design**: The Transformer architecture was adopted, which is a deep learning model widely used in natural language processing for sequence-to-sequence tasks. In this study, the morphological phenotypes of genes are encoded as embeddings, and the features of compounds are also transformed into corresponding embeddings. The task of the model is to infer the potential targets of compounds from the gene embeddings. 2. **Experimental Design**: A dataset containing 302 compounds and 160 genes (CPJUMP1) was used, which was carefully selected to cover a variety of different biological targets. The generalization ability of the model was evaluated by dividing the dataset in different ways (such as leave-one-compound-out, leave-one-gene-out strategies). 3. **Performance Improvement**: Compared to traditional methods that only match compound features, the model significantly improved prediction accuracy, especially in predicting new compounds for known genes. However, the performance of the model declined when predicting new compounds for unknown genes. In summary, this study demonstrates that integrating gene morphological phenotype information into machine learning models can effectively improve the accuracy of drug target prediction, which is of significant importance for accelerating the drug discovery process. However, to achieve accurate predictions for completely unknown genes, further expansion of the dataset size and exploration of more advanced model architectures and techniques are needed.

Cell morphological representations of genes enhance prediction of drug targets

Does Drug-Target Have A Likeness?

Predicting cell morphological responses to perturbations using generative modeling

Cell Painting-based bioactivity prediction boosts high-throughput screening hit-rates and compound diversity

Predicting compound activity from phenotypic profiles and chemical structures

Learning Molecular Representation in a Cell

Morphological Profiling for Drug Discovery in the Era of Deep Learning

Confounder-aware foundation modeling for accurate phenotype profiling in cell imaging

Prediction of Compound Cytotoxicity Based on Compound Structures and Cell Line Molecular Characteristics

Enhancing the Small-Scale Screenable Biological Space beyond Known Chemogenomics Libraries with Gray Chemical Matter─Compounds with Novel Mechanisms from High-Throughput Screening Profiles

Theoretical Approaches to the Prediction of the Biological Targets of Small-Molecular Compounds Based on Chemogenomic Information

From Pixels to Phenotypes: Integrating Image-Based Profiling with Cell Health Data Improves Interpretability

MOTIVE: A Drug-Target Interaction Graph For Inductive Link Prediction

Quantitatively integrating molecular structure and bioactivity profile evidence into drug-target relationship analysis

drug-target prediction tool through the integration of chemogenomic data and clustering analysis

TargetHunter: an in Silico Target Identification Tool for Predicting Therapeutic Potential of Small Organic Molecules Based on Chemogenomic Database

Cell Morphology-Guided Small Molecule Generation with GFlowNets

Integrating Genomics and Proteomics Data to Predict Drug Effects Using Binary Linear Programming

Gex2SGen: Designing Drug-like Molecules from Desired Gene Expression Signatures

Leveraging Cell Painting Images to Expand the Applicability Domain and Actively Improve Deep Learning Quantitative Structure–Activity Relationship Models

Drug Target Prediction Through Deep Learning Functional Representation of Gene Signatures