Abstract:Using "soft" targets to improve model performance has been shown to be effective in classification settings, but the usage of soft targets for regression is a much less studied topic in machine learning. The existing literature on the usage of soft targets for regression fails to properly assess the method's limitations, and empirical evaluation is quite limited. In this work, we assess the strengths and drawbacks of existing methods when applied to molecular property regression tasks. Our assessment outlines key biases present in existing methods and proposes methods to address them, evaluated through careful ablation studies. We leverage these insights to propose Distributional Mixture of Experts (DMoE): A model-independent, and data-independent method for regression which trains a model to predict probability distributions of its targets. Our proposed loss function combines the cross entropy between predicted and target distributions and the L1 distance between their expected values to produce a loss function that is robust to the outlined biases. We evaluate the performance of DMoE on different molecular property prediction datasets -- Open Catalyst (OC20), MD17, and QM9 -- across different backbone model architectures -- SchNet, GemNet, and Graphormer. Our results demonstrate that the proposed method is a promising alternative to classical regression for molecular property prediction tasks, showing improvements over baselines on all datasets and architectures.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of using "soft" targets (i.e., probability distributions) to improve model performance in molecular property regression tasks. While the use of soft targets has been proven effective in classification tasks, its application in regression tasks, particularly in molecular prediction, has been relatively underexplored. Existing literature lacks sufficient evaluation of the application of soft targets in regression and has limited empirical assessments. Specifically, the paper investigates the following issues: 1. **Limitations of Existing Methods**: Evaluating the advantages and disadvantages of existing methods when applied to molecular property regression tasks, highlighting key biases. 2. **Proposing Improved Methods**: Introducing a new method—Distribution Mixture of Experts (DMoE)—to address the biases in existing methods. 3. **Performance Validation**: Validating the effectiveness and superiority of the DMoE method through experiments on multiple molecular property prediction datasets (such as Open Catalyst, MD17, and QM9). ### Main Contributions - **Evaluation of Existing Methods**: A detailed analysis of the performance of existing methods in molecular property regression tasks, identifying their biases. - **Proposing the DMoE Method**: Designing a model-independent and data-independent regression method that improves performance by training the model to predict the probability distribution of the target. - **Loss Function Design**: Proposing a loss function that combines cross-entropy and L1 distance to enhance the robustness of the model. - **Experimental Validation**: Conducting extensive experiments on multiple datasets and different backbone model architectures, demonstrating significant improvements with the DMoE method. ### Key Findings - **Performance Improvement**: The DMoE method shows significant performance improvements over baseline methods on multiple molecular property prediction datasets, particularly in energy prediction and threshold accuracy. - **Uncertainty Quantification**: The DMoE method naturally quantifies the uncertainty of model predictions, providing reliable uncertainty measures. - **Robustness**: Theoretical analysis and experimental validation demonstrate the advantages of the DMoE method in gradient stability. ### Conclusion By proposing the DMoE method, the paper successfully addresses the challenges of using soft targets in molecular property regression tasks and validates its effectiveness and superiority across multiple datasets. This method not only improves model performance but also provides a new approach for molecular property prediction tasks.

Distribution Learning for Molecular Regression

Understanding the Limitations of Deep Models for Molecular Property Prediction: Insights and Solutions.

Learning Substructure Invariance for Out-of-Distribution Molecular Representations

Fast and Effective Molecular Property Prediction with Transferability Map

Coordinating Cross-modal Distillation for Molecular Property Prediction

Flexible Dual-Branched Message-Passing Neural Network for a Molecular Property Prediction

Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective

Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey

Learning Invariant Molecular Representation in Latent Discrete Space

Improving Molecular Representation Learning with Metric Learning-enhanced Optimal Transport

Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning

Analyzing Learned Molecular Representations for Property Prediction

Flexible dual-branched message passing neural network for quantum mechanical property prediction with molecular conformation

MvMRL: a multi-view molecular representation learning method for molecular property prediction

Predicting equilibrium distributions for molecular systems with deep learning

Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Prediction

DropConn: Dropout Connection Based Random GNNs for Molecular Property Prediction

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery

Evidential meta-model for molecular property prediction

Low cost prediction of probability distributions of molecular properties for early virtual screening