Large Language Models as Molecular Design Engines

Debjyoti Bhattacharya,Wesley Reinhart,Harrison Cassady,Michael Hickner
DOI: https://doi.org/10.26434/chemrxiv-2024-n0l8q-v2
2024-05-21
Abstract:The design of small molecules is crucial for technological applications ranging from drug discovery to energy storage. Due to the vast design space available to modern synthetic chemistry, the community has increasingly sought to use data-driven and machine learning approaches to navigate this space. Although generative machine learning methods have recently shown potential for computational molecular design, their use is hindered by complex training procedures, and they often fail to generate valid and unique molecules. In this context, pre-trained Large Language Models (LLMs) have emerged as potential tools for molecular design, as they appear to be capable of creating and modifying molecules based on simple instructions provided through natural language prompts. In this work, we show that the Claude 3 Opus LLM can read, write, and modify molecules according to prompts, with an impressive 97% valid and unique molecules. By quantifying these modifications in a low-dimensional latent space, we systematically evaluate the model’s behavior under different prompting conditions. Notably, the model is able to perform guided molecular generation when asked to manipulate the electronic structure of molecules using simple, natural-language prompts. Our findings highlight the potential of LLMs as powerful and versatile molecular design engines.
Chemistry
What problem does this paper attempt to address?
This paper discusses how to use large-scale language models (LLMs) for molecular design. In the study, the authors found that pre-trained LLMs, such as Claude 3 Opus, can read, write, and modify molecules based on natural language instructions, generating 97% valid and unique molecules. By quantifying these modifications in a low-dimensional latent space, they systematically evaluated the behavior of the model under different prompting conditions and demonstrated how the model can guide molecular generation based on simple natural language instructions to alter the electronic structure of molecules. The paper also mentions that while traditional machine learning models may encounter issues of invalid or irrelevant structures in molecular design, LLMs can avoid some of these challenges due to their adaptability and generalization ability. The research results emphasize the potential of LLMs as powerful and diverse molecular design tools.