Zero Shot Molecular Generation via Similarity Kernels

Rokas Elijošius,Fabian Zills,Ilyes Batatia,Sam Walton Norwood,Dávid Péter Kovács,Christian Holm,Gábor Csányi
2024-02-14
Abstract:Generative modelling aims to accelerate the discovery of novel chemicals by directly proposing structures with desirable properties. Recently, score-based, or diffusion, generative models have significantly outperformed previous approaches. Key to their success is the close relationship between the score and physical force, allowing the use of powerful equivariant neural networks. However, the behaviour of the learnt score is not yet well understood. Here, we analyse the score by training an energy-based diffusion model for molecular generation. We find that during the generation the score resembles a restorative potential initially and a quantum-mechanical force at the end. In between the two endpoints, it exhibits special properties that enable the building of large molecules. Using insights from the trained model, we present Similarity-based Molecular Generation (SiMGen), a new method for zero shot molecular generation. SiMGen combines a time-dependent similarity kernel with descriptors from a pretrained machine learning force field to generate molecules without any further training. Our approach allows full control over the molecular shape through point cloud priors and supports conditional generation. We also release an interactive web tool that allows users to generate structures with SiMGen online (
Chemical Physics,Machine Learning
What problem does this paper attempt to address?
This paper mainly discusses the problem of molecular generation, especially the method of zero-shot molecular generation through similarity kernel. Existing generation models, especially diffusion models, have shown excellent performance in accelerating the discovery of new chemicals, but their working principles are not fully understood. The researchers analyzed the scoring function by training an energy-based diffusion model and found that the score is initially similar to the restoring potential and then becomes a quantum mechanical force during the generation process. They proposed a new method called Similarity-based Molecular Generation (SiMGen), which combines time-dependent similarity kernels and pre-trained machine learning force field descriptors to generate molecules without further training. SiMGen allows comprehensive control over the shape of molecules and supports conditional generation. This approach addresses the limitations of traditional methods such as ab initio random structure searching (AIRSS) in generating complex molecules, which often result in fragmented molecules. SiMGen utilizes local similarity kernels and pre-trained machine learning force fields to achieve local and scalable generation processes, enabling the construction of large molecules and conditional generation. Furthermore, the paper mentions that although diffusion models perform well in 3D molecular generation, they suffer from poor scalability, limited user control, and low transferability. SiMGen solves these problems by using similarity kernels and evolutionary algorithms, providing a "zero-shot" generation method without the need for model training. In summary, this paper aims to address how to efficiently generate novel molecules with desirable properties, especially overcoming the challenges of existing methods in generating complex and large molecules, and proposes a novel similarity-based generation strategy.