3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

Qizhi Pei,Lijun Wu,Kaiyuan Gao,Jinhua Zhu,Rui Yan
2024-06-09
Abstract:The integration of molecule and language has garnered increasing attention in molecular science. Recent advancements in Language Models (LMs) have demonstrated potential for the comprehensive modeling of molecule and language. However, existing works exhibit notable limitations. Most existing works overlook the modeling of 3D information, which is crucial for understanding molecular structures and also functions. While some attempts have been made to leverage external structure encoding modules to inject the 3D molecular information into LMs, there exist obvious difficulties that hinder the integration of molecular structure and language text, such as modality alignment and separate tuning. To bridge this gap, we propose 3D-MolT5, a unified framework designed to model both 1D molecular sequence and 3D molecular structure. The key innovation lies in our methodology for mapping fine-grained 3D substructure representations (based on 3D molecular fingerprints) to a specialized 3D token vocabulary for 3D-MolT5. This 3D structure token vocabulary enables the seamless combination of 1D sequence and 3D structure representations in a tokenized format, allowing 3D-MolT5 to encode molecular sequence (SELFIES), molecular structure, and text sequences within a unified architecture. Alongside, we further introduce 1D and 3D joint pre-training to enhance the model's comprehension of these diverse modalities in a joint representation space and better generalize to various tasks for our foundation model. Through instruction tuning on multiple downstream datasets, our proposed 3D-MolT5 shows superior performance than existing methods in molecular property prediction, molecule captioning, and text-based molecule generation tasks. Our code will be available on GitHub soon.
Biomolecules,Artificial Intelligence,Computational Engineering, Finance, and Science,Computation and Language,Machine Learning
What problem does this paper attempt to address?
This paper focuses on the problem of integrating molecular and language modeling in molecular science. Most existing methods overlook the modeling of three-dimensional (3D) information, which is crucial for understanding molecular structure and function. To address this issue, the paper proposes the 3D-MolT5 framework, which is a unified model capable of understanding and handling 3D molecular structures and related tasks. 3D-MolT5 maps the fine-grained 3D substructure representations to a dedicated 3D vocabulary using the 3D molecular fingerprint algorithm (E3FP), enabling seamless integration of 1D sequences and 3D structure representations in a tokenized form. Additionally, the model incorporates joint pretraining of 1D and 3D to enhance understanding and generalization across different modalities. Experimental results on multiple downstream datasets demonstrate that 3D-MolT5 outperforms existing methods in tasks such as molecular property prediction, molecular description generation, and text-based molecular design.