Abstract:A central task in computational drug discovery is to construct models from known active molecules to find further promising molecules for subsequent screening. However, typically only very few active molecules are known. Therefore, few-shot learning methods have the potential to improve the effectiveness of this critical phase of the drug discovery process. We introduce a new method for few-shot drug discovery. Its main idea is to enrich a molecule representation by knowledge about known context or reference molecules. Our novel concept for molecule representation enrichment is to associate molecules from both the support set and the query set with a large set of reference (context) molecules through a Modern Hopfield Network. Intuitively, this enrichment step is analogous to a human expert who would associate a given molecule with familiar molecules whose properties are known. The enrichment step reinforces and amplifies the covariance structure of the data, while simultaneously removing spurious correlations arising from the decoration of molecules. Our approach is compared with other few-shot methods for drug discovery on the FS-Mol benchmark dataset. On FS-Mol, our approach outperforms all compared methods and therefore sets a new state-of-the art for few-shot learning in drug discovery. An ablation study shows that the enrichment step of our method is the key to improve the predictive quality. In a domain shift experiment, we further demonstrate the robustness of our method. Code is available at <a class="link-external link-https" href="https://github.com/ml-jku/MHNfs" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### The Problem This Paper Attempts to Solve This paper aims to address the issue of data scarcity in the drug discovery process, particularly how to improve the effectiveness of predictive models when only a small number of known active molecules are available. Specifically: 1. **Low Data Problem in Drug Discovery**: - The drug discovery process typically requires a large amount of biometrics data, but the amount of data available in actual projects is very limited. Traditional deep learning methods require hundreds or thousands of data points to train high-accuracy predictive models. - In drug design projects, obtaining large amounts of data is very difficult due to the expensive and time-consuming nature of in vitro experiments. 2. **Improving Low-Sample Learning Methods**: - Existing low-sample learning methods often perform worse than simple baseline models in drug discovery tasks, as these methods tend to ignore background information (such as similar molecules and similar activities). - To address this, the paper proposes a new method that enriches molecular representations by associating the query set and support set with a large number of background molecules, thereby improving the quality of the predictive model. ### Main Contributions 1. **Proposing a New Architecture MHNfs**: - Utilizing modern Hopfield networks (MHN) to enhance molecular representations, achieving state-of-the-art results on the FS-Mol benchmark dataset. 2. **Introducing the Concept of Background Enhancement**: - Enriching molecular representations by associating them with a large number of background molecules, thereby improving the model's generalization ability. 3. **Adding a Simple Baseline**: - Adding a simple baseline model to the FS-Mol benchmark dataset, which outperforms most published low-sample learning methods. 4. **Experimental Validation**: - Further demonstrating the effectiveness of the new method through ablation studies and domain transfer experiments.

Context-enriched molecule representations improve few-shot drug discovery

An Image-enhanced Molecular Graph Representation Learning Framework

Contextual Representation Anchor Network to Alleviate Selection Bias in Few-Shot Drug Discovery

Mol2Context-vec: learning molecular representation from context awareness for drug discovery

Low Data Drug Discovery with One-Shot Learning

Multimodal Protein-Ligand Contrastive Pretraining for Effective and Efficient Drug Discovery

In-Context Learning for Few-Shot Molecular Property Prediction

Few-shot molecular property prediction via Hierarchically Structured Learning on Relation Graphs

MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning

Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction

Advancing Drug Discovery with Deep Learning: Harnessing Reinforcement Learning and One-Shot Learning for Molecular Design in Low-Data Situations

Few-Shot Learning for Low-Data Drug Discovery

SynthFormer: Equivariant Pharmacophore-based Generation of Molecules for Ligand-Based Drug Design

Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Cross-Domain Few-Shot Learning by Representation Fusion

Improvement of multi-task learning by data enrichment: application for drug discovery

Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery

Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks

Few-shot learning with transformers via graph embeddings for molecular property prediction

An effective self-supervised framework for learning expressive molecular global representations to drug discovery

Few-shot learning via graph embeddings with convolutional networks for low-data molecular property prediction