Abstract:Structure-based molecular ML (SBML) models can be highly sensitive to input geometries and give predictions with large variance. We present an approach to mitigate the challenge of selecting conformations for such models by generating conformers that explicitly minimize predictive uncertainty. To achieve this, we compute estimates of aleatoric and epistemic uncertainties that are differentiable w.r.t. latent posteriors. We then iteratively sample new latents in the direction of lower uncertainty by gradient descent. As we train our predictive models jointly with a conformer decoder, the new latent embeddings can be mapped to their corresponding inputs, which we call \textit{MoleCLUEs}, or (molecular) counterfactual latent uncertainty explanations \citep{antoran2020getting}. We assess our algorithm for the task of predicting drug properties from 3D structure with maximum confidence. We additionally analyze the structure trajectories obtained from conformer optimizations, which provide insight into the sources of uncertainty in SBML.

What problem does this paper attempt to address?

The main goal of this paper is to address the challenges faced by Structure-Based Molecular Machine Learning (SBML) models in predicting molecular properties, particularly the issue of increased prediction uncertainty when input geometries are out of the training data distribution. The authors propose a method called MoleCLUEs, which aims to generate molecular conformations that minimize prediction uncertainty. Specifically, the problems addressed in the paper can be summarized as follows: 1. **Uncertainty in SBML models**: SBML models are highly sensitive to input geometries and may produce predictions with high variance. This is mainly due to the lack of guiding principles for conformation selection in new molecules, leading to increased prediction uncertainty. 2. **Challenges in conformation selection**: Current methods often assume that new conformations follow the same distribution as those in the training set when selecting molecular conformations for prediction. However, in real-world scenarios, this assumption is hard to guarantee, resulting in poor generalization of the model. 3. **Need in high-risk scenarios**: In high-risk applications such as drug discovery, more precise and reliable prediction results are required. Therefore, it is necessary to develop methods to adjust or correct model biases introduced by 3D structure generation and reduce uncertainty when predicting labels for out-of-distribution (OOD) input geometries. To address the above issues, the paper proposes the MoleCLUEs method, which is implemented through the following steps: - **Different differentiable uncertainty estimations**: Calculate measures that characterize prediction uncertainty, including aleatoric uncertainty representing data noise and epistemic uncertainty representing knowledge or data deficiency. - **Counterfactual conformation generation**: Use these uncertainty estimations to guide the sampling process, generating new latent representations corresponding to new, in-distribution conformations, referred to as MoleCLUEs. - **Optimization process**: Iteratively sample new latent representations in the direction of gradient descent to reduce uncertainty. - **Evaluation**: Experiments validate that the MoleCLUEs method effectively reduces prediction uncertainty and improves prediction accuracy, especially when dealing with conformations with artificially added noise. In summary, this study aims to improve the reliability and accuracy of SBML models in applications such as drug discovery by enhancing the method of selecting molecular conformations.

MoleCLUEs: Molecular Conformers Maximally In-Distribution for Predictive Models

PrefixMol: Target- and Chemistry-aware Molecule Design Via Prefix Embedding

Molecular machine learning with conformer ensembles

Swallowing the Bitter Pill: Simplified Scalable Conformer Generation

Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design

Moltiverse: Molecular Conformer Generation Using Enhanced Sampling Methods

CoarsenConf: Equivariant Coarsening with Aggregated Attention for Molecular Conformer Generation

Structure-Based Molecule Optimization via Gradient-Guided Bayesian Update

Conformational Space Profile Enhances Generic Molecular Representation Learning

Do Deep Learning Methods Really Perform Better in Molecular Conformation Generation?

Predicting Molecular Ground-State Conformation via Conformation Optimization

MolCloze - A Unified Cloze-style Self-supervised Molecular Structure Learning Model for Chemical Property Prediction.

Molecular Conformation Generation via Shifting Scores

Structure Language Models for Protein Conformation Generation

Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks

Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design

Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks

Diffusion-Driven Generative Framework for Molecular Conformation Prediction

Eliminating the Deadwood: A Machine Learning Model for CCS Knowledge-Based Conformational Focusing for Lipids

Amortized template-matching of molecular conformations from cryo-electron microscopy images using simulation-based inference

MOLUCINATE: A Generative Model for Molecules in 3D Space