Abstract:The design of functional transition metal complexes (TMCs) is hindered by the combinatorial explosion of the search space spanned by various metals and ligands, necessitating efficient multi- objective optimization strategies. Traditional genetic algorithms (GAs) are frequently employed in this domain, utilizing random mutations and crossovers steered by explicit mathematical objective formulations to navigate the search space. The transfer and sharing of knowledge across different GA optimization tasks, however, remain challenging. Here, we introduce the integration of large language models (LLMs) into the evolutionary optimization framework (LLM-EO) for TMCs. LLM- EO significantly outperforms traditional GAs due to the intrinsic chemical knowledge embedded within LLMs, acquired during their extensive pretraining. Notably, without the need for supervised fine-tuning, LLMs can leverage the entirety of historical data amassed during the optimization processes, demonstrating superior performance compared to LLMs that are limited to the best TMCs identified in the evolutionary cycle. Specifically, LLM-EO identifies eight out of the top 20 TMCs with the largest HOMO-LUMO gaps by interrogating merely 200 candidates within a vast search space of 1.37 million TMCs. Through prompt engineering using natural language, LLM-EO introduces unparalleled flexibility in multi-objective optimizations, especially when guided by seasoned researchers, thereby circumventing the necessity for intricate mathematical formulations. As generative models, LLMs possess the capability to propose novel ligands and TMCs with unique chemical properties by amalgamating both internal knowledge and external chemistry data, thus combining the benefits of efficient optimization and molecular generation. With the increasing potential of LLMs, both in their capacity as pretrained foundational models and new strategies in post-training inference, we anticipate broad applications of LLM-based evolutionary optimization in the fields of chemistry and materials design.

ChemGen: Towards Understanding First-Principles Calculation Code Generation Based on Large Language Models

Developing Large Language Models for Quantum Chemistry Simulation Input Generation

Evaluating Large Language Models in Class-Level Code Generation

Assessment of chemistry knowledge in large language models that generate code

When LLM-based Code Generation Meets the Software Development Process

SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents

Quantum Many-Body Physics Calculations with Large Language Models

On the Effectiveness of Large Language Models in Domain-Specific Code Generation

BatGPT-Chem: A Foundation Large Model For Chemical Engineering

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Examination of Code generated by Large Language Models

Structured Chemistry Reasoning with Large Language Models

A Self-Iteration Code Generation Method Based on Large Language Models

An Empirical Study of the Code Generation of Safety-Critical Software Using LLMs

Fixing Code Generation Errors for Large Language Models

Enabling Programming Thinking in Large Language Models Toward Code Generation

BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models

Generative Design of Functional Metal Complexes Utilizing the Internal Knowledge of Large Language Models

The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-based Code Generation

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study