ChatMol: Interactive Molecular Discovery with Natural Language

Zheni Zeng,Bangchen Yin,Shipeng Wang,Jiarui Liu,Cheng Yang,Haishen Yao,Xingzhi Sun,Maosong Sun,Guotong Xie,Zhiyuan Liu
DOI: https://doi.org/10.1093/bioinformatics/btae534
IF: 5.8
2024-09-02
Bioinformatics
Abstract:Motivation: Natural language is poised to become a key medium for human-machine interactions in the era of large language models. In the field of biochemistry, tasks such as property prediction and molecule mining are critically important yet technically challenging. Bridging molecular expressions in natural language and chemical language can significantly enhance the interpretability and ease of these tasks. Moreover, it can integrate chemical knowledge from various sources, leading to a deeper understanding of molecules. Results: Recognizing these advantages, we introduce the concept of conversational molecular design, a novel task that utilizes natural language to describe and edit target molecules. To better accomplish this task, we develop ChatMol, a knowledgeable and versatile generative pre-trained model. This model is enhanced by incorporating experimental property information, molecular spatial knowledge, and the associations between natural and chemical languages. Several typical solutions including large language models (e.g., ChatGPT) are evaluated, proving the challenge of conversational molecular design and the effectiveness of our knowledge enhancement approach. Case observations and analysis offer insights and directions for further exploration of natural-language interaction in molecular discovery. Availability and implementation: Codes and data are provided in https://github.com/Ellenzzn/ChatMol/tree/main. Supplementary information: Supplementary data are available online.
What problem does this paper attempt to address?