Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation with a Unified Model

Bo Qiang,Yiran Zhou,Yuheng Ding,Ningfeng Liu,Song Song,Liangren Zhang,Bo Huang,Zhenming Liu
DOI: https://doi.org/10.1038/s42256-023-00764-9
2024-03-07
Abstract:Chemical reactions are the fundamental building blocks of drug design and organic chemistry research. In recent years, there has been a growing need for a large-scale deep-learning framework that can efficiently capture the basic rules of chemical reactions. In this paper, we have proposed a unified framework that addresses both the reaction representation learning and molecule generation tasks, which allows for a more holistic approach. Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model. Our framework achieves state-of-the-art results on challenging downstream tasks. By possessing chemical knowledge, our generative framework overcome the limitations of current molecule generation models that rely on a small number of reaction templates. In the extensive experiments, our model generates synthesizable drug-like structures of high quality. Overall, our work presents a significant step toward a large-scale deep-learning framework for a variety of reaction-based applications.
Machine Learning,Biomolecules
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address several key issues in chemical reaction representation learning and condition-based molecular generation tasks, proposing a unified deep learning framework to tackle these challenges. #### Main Issues: 1. **Chemical Reaction Representation Learning**: Existing methods for chemical reaction representation overlook fundamental principles of organic chemistry, leading to limited performance. For example, masking bonds or atoms outside the reaction center results in the loss of important information. 2. **Molecular Generation Tasks**: Traditional template methods rely on predefined building blocks and reactions, which restricts the accessible chemical space. Additionally, in drug design, there is a growing need to use chemical reactions as editing tools to modify given structures. #### Solutions: 1. **Self-Supervised Contrastive Learning Framework**: By designing specific self-supervised tasks (such as active center prediction, pairing main reactants with by-products, etc.), this framework can capture the fundamental rules of chemical reactions and embed this knowledge into the model. 2. **Conditional Generation Model**: A template-free conditional generation model is proposed, which can generate multiple synthesizable analogs from a given seed structure, thus overcoming the reliance on limited templates in existing methods during the generation process. 3. **Comprehensive Evaluation**: Through the evaluation of multiple downstream tasks (such as classification, retrieval, and generation), the framework demonstrates superior performance in capturing chemical rules and generating high-quality drug-like structures. In summary, this paper proposes a new unified framework that not only improves chemical reaction representation learning but also excels in molecular generation tasks, showcasing its broad application potential in fields such as drug design.