Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule

Yi Xiao,Xiangxin Zhou,Qiang Liu,Liang Wang
2024-03-07
Abstract:Artificial intelligence has demonstrated immense potential in scientific research. Within molecular science, it is revolutionizing the traditional computer-aided paradigm, ushering in a new era of deep learning. With recent progress in multimodal learning and natural language processing, an emerging trend has targeted at building multimodal frameworks to jointly model molecules with textual domain knowledge. In this paper, we present the first systematic survey on multimodal frameworks for molecules research. Specifically,we begin with the development of molecular deep learning and point out the necessity to involve textual modality. Next, we focus on recent advances in text-molecule alignment methods, categorizing current models into two groups based on their architectures and listing relevant pre-training tasks. Furthermore, we delves into the utilization of large language models and prompting techniques for molecular tasks and present significant applications in drug discovery. Finally, we discuss the limitations in this field and highlight several promising directions for future research.
Biomolecules,Computation and Language,Machine Learning
What problem does this paper attempt to address?
This paper focuses on integrating text and molecular information to construct a multimodal framework for enhancing molecular research, particularly in the field of drug discovery. Traditional computer-aided methods in molecular science have been revolutionized by deep learning, but existing deep learning models have limited understanding of chemical knowledge and rely on annotated data. The paper proposes that recent advancements in multimodal learning and natural language processing provide new insights into establishing connections between text and molecules. The authors present two main approaches: one considers molecules as a language with special grammar and utilizes cross-lingual models to simultaneously process text and molecules; the other explores potential alignment between text and structured molecular data and integrates large-scale language models into the multimodal framework for cross-modal molecular task prediction. Additionally, the paper mentions the application of prompt engineering techniques during the training process, which enables good results in many molecular tasks without requiring a large amount of pretraining data. The paper categorizes current work, discusses training strategies, dataset construction methods, and relevant applications, and analyzes the limitations of this field, pointing out several promising directions for future research. Overall, this paper is the first systematic survey on the multimodal framework in molecular research, aiming to summarize recent progress and propose future research prospects.