Abstract:The design of functional transition metal complexes (TMCs) is hindered by the combinatorial explosion of the search space spanned by various metals and ligands, necessitating efficient multi- objective optimization strategies. Traditional genetic algorithms (GAs) are frequently employed in this domain, utilizing random mutations and crossovers steered by explicit mathematical objective formulations to navigate the search space. The transfer and sharing of knowledge across different GA optimization tasks, however, remain challenging. Here, we introduce the integration of large language models (LLMs) into the evolutionary optimization framework (LLM-EO) for TMCs. LLM- EO significantly outperforms traditional GAs due to the intrinsic chemical knowledge embedded within LLMs, acquired during their extensive pretraining. Notably, without the need for supervised fine-tuning, LLMs can leverage the entirety of historical data amassed during the optimization processes, demonstrating superior performance compared to LLMs that are limited to the best TMCs identified in the evolutionary cycle. Specifically, LLM-EO identifies eight out of the top 20 TMCs with the largest HOMO-LUMO gaps by interrogating merely 200 candidates within a vast search space of 1.37 million TMCs. Through prompt engineering using natural language, LLM-EO introduces unparalleled flexibility in multi-objective optimizations, especially when guided by seasoned researchers, thereby circumventing the necessity for intricate mathematical formulations. As generative models, LLMs possess the capability to propose novel ligands and TMCs with unique chemical properties by amalgamating both internal knowledge and external chemistry data, thus combining the benefits of efficient optimization and molecular generation. With the increasing potential of LLMs, both in their capacity as pretrained foundational models and new strategies in post-training inference, we anticipate broad applications of LLM-based evolutionary optimization in the fields of chemistry and materials design.

Large language models design sequence-defined macromolecules via evolutionary optimization

Large Language Models as Molecular Design Engines

Adaptive language model training for molecular design

Large Language Models As Evolution Strategies

De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Small Molecule Optimization with Large Language Models

The Future of Molecular Studies Through the Lens of Large Language Models.

Language models generalize beyond natural proteins

Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study

Evolutionary-scale prediction of atomic-level protein structure with a language model

Language models can generate molecules, materials, and protein binding sites directly in three dimensions as XYZ, CIF, and PDB files

Generative Design of Functional Metal Complexes Utilizing the Internal Knowledge of Large Language Models

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Atom-by-atom protein generation and beyond with language models

Large language model for molecular chemistry

Discovering Photoswitchable Molecules for Drug Delivery with Large Language Models and Chemist Instruction Training

Large language models help computer programs to evolve

Protein Design by Directed Evolution Guided by Large Language Models