Abstract:Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding of the complex physical interactions between the molecule and its environment. In this paper, we present a novel generative model, BindGPT which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. Our model produces molecular graphs and conformations jointly, eliminating the need for an extra graph reconstruction step. We pretrain BindGPT on a large-scale dataset and fine-tune it with reinforcement learning using scores from external simulation software. We demonstrate how a single pretrained language model can serve at the same time as a 3D molecular generative model, conformer generator conditioned on the molecular graph, and a pocket-conditioned 3D molecule generator. Notably, the model does not make any representational equivariance assumptions about the domain of generation. We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models, language models, and graph neural networks while being two orders of magnitude cheaper to sample.

What problem does this paper attempt to address?

This paper introduces a framework called BindGPT for scalable 3D molecular design through language modeling and reinforcement learning. Currently, generating novel active molecules for a given protein is a challenge for generative models, requiring an understanding of the complex physical interactions between the molecules and their environment. BindGPT proposes an innovative approach that can directly create 3D molecules within protein binding sites while generating molecular graphs and conformations without the need for additional graph reconstruction steps. The model is first pretrained on a large-scale dataset and then fine-tuned using scores from external simulation software through reinforcement learning. The pretrained model can serve as a 3D molecule generator, a graph-based conformation generator, and a 3D molecule generator under pocket conditions simultaneously. It is worth noting that the model does not rely on any specific domain-specific equivariance assumptions. The paper mentions that although existing methods can directly generate 3D molecules, most of them rely on external tools to build bonds, which may lead to accuracy issues. In contrast, BindGPT uses structural SMILES and XYZ formats to describe the molecular graph and atomic positions, reducing the dependence on external software. Experimental results demonstrate that BindGPT performs comparably to state-of-the-art diffusion models, language models, and graph neural networks in 3D molecular generation tasks, but with a two-order-of-magnitude improvement in sampling efficiency. Additionally, through reinforcement learning fine-tuning, the model can find structures with high binding scores for any given protein. In conclusion, the paper aims to address the problem of how to generate 3D active molecules more effectively, particularly considering their interactions with proteins, while reducing reliance on external tools and improving the accuracy and efficiency of generation.

BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

3D Molecular Pocket-based Generation with Token-only Large Language Model

DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding

3DSMILES-GPT: 3D Molecular Pocket-based Generation with Token-only Large Language Model

The Future of Molecular Studies Through the Lens of Large Language Models.

Advances in Deep Learning-Based 3D Molecular Generative Models

MolGPT: Molecular Generation Using a Transformer-Decoder Model

Generating 3D Molecular Structures Conditional on a Receptor Binding Site with Deep Generative Models

LigGPT: Molecular Generation using a Transformer-Decoder Model

Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

Diffusion-Driven Generative Framework for Molecular Conformation Prediction

MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling

In-Pocket 3D Graphs Enhance Ligand-Target Compatibility in Generative Small-Molecule Creation

cMolGPT: A Conditional Generative Pre-Trained Transformer for Target-Specific De Novo Molecular Generation

Fragment-Based Ligand Generation Guided By Geometric Deep Learning On Protein-Ligand Structure

Generation of 3D molecules in pockets via a language model

CProMG: controllable protein-oriented molecule generation with desired binding affinity and drug-like properties

Generating 3D molecules conditional on receptor binding sites with deep generative models

Design of Peptide Binders to Conformationally Diverse Targets with Contrastive Language Modeling

Adapt-cMolGPT: A Conditional Generative Pre-Trained Transformer with Adapter-Based Fine-Tuning for Target-Specific Molecular Generation

LS-MolGen: Ligand-and-Structure Dual-Driven Deep Reinforcement Learning for Target-Specific Molecular Generation Improves Binding Affinity and Novelty.