Navigating Chemical Space with Latent Flows

Guanghao Wei,Yining Huang,Chenru Duan,Yue Song,Yuanqi Du
2024-05-08
Abstract:Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective molecule optimization tasks under both supervised and unsupervised molecular discovery settings. Codes and demos are publicly available on GitHub at
Machine Learning,Chemical Physics
What problem does this paper attempt to address?
This paper focuses on exploring and understanding the latent space of molecular generative models in the chemical space. Current methods either rely on combinatorial optimization to search for new random molecules or use deep generative models to approximate the molecular distribution and generate new molecules. Although these methods show potential in small molecule, protein, and material design, the enormous scale of the chemical space (estimated number of drug-like molecules ranging from 10^23 to 10^60) requires more efficient search methods or better understanding of the structure of the chemical space. The paper proposes a new framework called ChemFlow, which navigates through the latent space using a perspective of fluid dynamics to explore the chemical space in a non-linear transformational manner. The authors unify previous methods such as gradient-based optimization, linear latent traversals, and disentangled traversals, and propose a flow model based on partial differential equations like the heat equation and the wave equation, enabling them to simulate the dynamics of real-world physical systems. In both supervised and unsupervised settings, ChemFlow can handle molecule manipulation and single-objective or multi-objective molecular optimization tasks. In the unsupervised setting, the framework can search for trajectories that maximize the variation in molecular structures, leading to attribute changes. Experimental results demonstrate the generality and effectiveness of ChemFlow across various tasks, and in some cases, it outperforms existing methods. In summary, the problem addressed in this paper is how to accelerate molecule discovery by more efficiently exploring and understanding the chemical latent space learned by deep generative models, particularly in applications such as drug design and material discovery.