Diffusion-Driven Domain Adaptation for Generating 3D Molecules

Haokai Hong,Wanyu Lin,Kay Chen Tan
2024-04-01
Abstract:Can we train a molecule generator that can generate 3D molecules from a new domain, circumventing the need to collect data? This problem can be cast as the problem of domain adaptive molecule generation. This work presents a novel and principled diffusion-based approach, called GADM, that allows shifting a generative model to desired new domains without the need to collect even a single molecule. As the domain shift is typically caused by the structure variations of molecules, e.g., scaffold variations, we leverage a designated equivariant masked autoencoder (MAE) along with various masking strategies to capture the structural-grained representations of the in-domain varieties. In particular, with an asymmetric encoder-decoder module, the MAE can generalize to unseen structure variations from the target domains. These structure variations are encoded with an equivariant encoder and treated as domain supervisors to control denoising. We show that, with these encoded structural-grained domain supervisors, GADM can generate effective molecules within the desired new domains. We conduct extensive experiments across various domain adaptation tasks over benchmarking datasets. We show that our approach can improve up to 65.6% in terms of success rate defined based on molecular validity, uniqueness, and novelty compared to alternative baselines.
Machine Learning,Chemical Physics,Biomolecules
What problem does this paper attempt to address?
The paper attempts to address how to train a molecule generator in the absence of data to adapt to new domains without collecting new data. The study proposes a diffusion-based method called GADM, which uses equivariant mask autoencoders to capture molecular structural changes and control the denoising process, thus generating effective molecules in the target new domain. This method specifically targets the distribution shift issues caused by molecular structural changes, such as skeletal and ring structure variations. Experimental results demonstrate that compared to existing baseline methods, GADM achieves a maximum improvement of 65.6% in success rate and performs well in generating novel molecules with desired structural variations.