Leveraging Active Subspaces to Capture Epistemic Model Uncertainty in Deep Generative Models for Molecular Design

A N M Nafiz Abeer,Sanket Jantre,Nathan M Urban,Byung-Jun Yoon
2024-08-16
Abstract:Deep generative models have been accelerating the inverse design process in material and drug design. Unlike their counterpart property predictors in typical molecular design frameworks, generative molecular design models have seen fewer efforts on uncertainty quantification (UQ) due to computational challenges in Bayesian inference posed by their large number of parameters. In this work, we focus on the junction-tree variational autoencoder (JT-VAE), a popular model for generative molecular design, and address this issue by leveraging the low dimensional active subspace to capture the uncertainty in the model parameters. Specifically, we approximate the posterior distribution over the active subspace parameters to estimate the epistemic model uncertainty in an extremely high dimensional parameter space. The proposed UQ scheme does not require alteration of the model architecture, making it readily applicable to any pre-trained model. Our experiments demonstrate the efficacy of the AS-based UQ and its potential impact on molecular optimization by exploring the model diversity under epistemic uncertainty.
Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of model uncertainty quantification (UQ) in deep generative models (especially the Junction Tree Variational Autoencoder, JT - VAE, for molecule generation) in the field of molecular design. Specifically, the author focuses on **epistemic uncertainty**, that is, the prediction uncertainty caused by the uncertainty of model parameters. #### Main problem background: 1. **Application of generative models in molecular design**: Deep generative models such as JT - VAE have been widely used in materials and drug design. These models can generate molecules with specific properties in the latent space through the reverse - design process. 2. **Importance of uncertainty quantification**: Although generative models perform excellently in molecular design, relatively little research has been done on their model uncertainty. Especially for generative models with a large number of parameters, Bayesian inference to quantify uncertainty is a computationally very challenging problem. 3. **Limitations of existing methods**: Most of the existing work mainly focuses on the uncertainty of molecular property predictors, and the uncertainty of the generative models themselves has not been fully explored. This limits the robust application of generative models in fields such as drug design. #### Core contributions of the paper: To solve the above problems, the author proposes to use the **low - dimensional active subspace** to capture the epistemic uncertainty of JT - VAE model parameters. The main advantages of this method include: - **Reducing computational complexity**: By projecting the high - dimensional parameter space onto the low - dimensional active subspace, the approximation of the posterior distribution becomes feasible. - **No need to modify the model architecture**: This method can be applied to any pre - trained generative model without changing the model structure, and has good versatility and plug - ability. - **Improving prediction performance**: By introducing a parameter distribution instead of a single point estimate, the model is more robust to different types of training molecules, thereby improving prediction performance. #### Method overview: 1. **Constructing the active subspace**: By sampling the gradients of model parameters, a low - dimensional active subspace is constructed, which can effectively capture the influence of model parameter changes on the output. 2. **Approximating the posterior distribution**: Variational inference is used to approximate the posterior distribution on the active subspace, thereby quantifying the epistemic uncertainty of model parameters. 3. **Experimental verification**: Through optimization experiments of multiple molecular properties, the effectiveness of the proposed method is verified, and its potential in molecular generation tasks is demonstrated. In short, this paper solves the problem that it is difficult to quantify uncertainty in the high - dimensional parameter space for deep generative models by introducing the active subspace technology, providing a more reliable and efficient tool for molecular design.