Abstract:The design of functional transition metal complexes (TMCs) is hindered by the combinatorial explosion of the search space spanned by various metals and ligands, necessitating efficient multi- objective optimization strategies. Traditional genetic algorithms (GAs) are frequently employed in this domain, utilizing random mutations and crossovers steered by explicit mathematical objective formulations to navigate the search space. The transfer and sharing of knowledge across different GA optimization tasks, however, remain challenging. Here, we introduce the integration of large language models (LLMs) into the evolutionary optimization framework (LLM-EO) for TMCs. LLM- EO significantly outperforms traditional GAs due to the intrinsic chemical knowledge embedded within LLMs, acquired during their extensive pretraining. Notably, without the need for supervised fine-tuning, LLMs can leverage the entirety of historical data amassed during the optimization processes, demonstrating superior performance compared to LLMs that are limited to the best TMCs identified in the evolutionary cycle. Specifically, LLM-EO identifies eight out of the top 20 TMCs with the largest HOMO-LUMO gaps by interrogating merely 200 candidates within a vast search space of 1.37 million TMCs. Through prompt engineering using natural language, LLM-EO introduces unparalleled flexibility in multi-objective optimizations, especially when guided by seasoned researchers, thereby circumventing the necessity for intricate mathematical formulations. As generative models, LLMs possess the capability to propose novel ligands and TMCs with unique chemical properties by amalgamating both internal knowledge and external chemistry data, thus combining the benefits of efficient optimization and molecular generation. With the increasing potential of LLMs, both in their capacity as pretrained foundational models and new strategies in post-training inference, we anticipate broad applications of LLM-based evolutionary optimization in the fields of chemistry and materials design.

Comparison of LLMs in Extracting Synthesis Conditions and Generating Q&A Datasets for Metal-Organic Frameworks

Evaluation of Open-Source Large Language Models for Metal-Organic Frameworks Research

What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis

Exploring the Potential of Large Language Models in Molecular Tasks: An Insightful Evaluation with GPT‐4

Benchmarking Large Language Models for Molecule Prediction Tasks

Benchmarking large language models for materials synthesis: the case of atomic layer deposition

LMM Chemical Research with Document Retrieval

LLM-based MOFs Synthesis Condition Extraction using Few-Shot Demonstrations

Fine-tuning Large Language Models for Chemical Text Mining

Structured Chemistry Reasoning with Large Language Models

Are large language models superhuman chemists?

Large Language Models as Molecular Design Engines

14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

A Review of Large Language Models and Autonomous Agents in Chemistry

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

SynAsk: Unleashing the Power of Large Language Models in Organic Synthesis

Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions

Large Language Models for Inorganic Synthesis Predictions

Generative Design of Functional Metal Complexes Utilizing the Internal Knowledge of Large Language Models