Generative Design of Functional Metal Complexes Utilizing the Internal Knowledge of Large Language Models

Chenru Duan,Jieyu Lu,Zhangde Song,Qiyuan Zhao,Yuanqi Du,Haojun Jia,Yirui Cao
DOI: https://doi.org/10.26434/chemrxiv-2024-z29m3
2024-10-24
Abstract:The design of functional transition metal complexes (TMCs) is hindered by the combinatorial explosion of the search space spanned by various metals and ligands, necessitating efficient multi- objective optimization strategies. Traditional genetic algorithms (GAs) are frequently employed in this domain, utilizing random mutations and crossovers steered by explicit mathematical objective formulations to navigate the search space. The transfer and sharing of knowledge across different GA optimization tasks, however, remain challenging. Here, we introduce the integration of large language models (LLMs) into the evolutionary optimization framework (LLM-EO) for TMCs. LLM- EO significantly outperforms traditional GAs due to the intrinsic chemical knowledge embedded within LLMs, acquired during their extensive pretraining. Notably, without the need for supervised fine-tuning, LLMs can leverage the entirety of historical data amassed during the optimization processes, demonstrating superior performance compared to LLMs that are limited to the best TMCs identified in the evolutionary cycle. Specifically, LLM-EO identifies eight out of the top 20 TMCs with the largest HOMO-LUMO gaps by interrogating merely 200 candidates within a vast search space of 1.37 million TMCs. Through prompt engineering using natural language, LLM-EO introduces unparalleled flexibility in multi-objective optimizations, especially when guided by seasoned researchers, thereby circumventing the necessity for intricate mathematical formulations. As generative models, LLMs possess the capability to propose novel ligands and TMCs with unique chemical properties by amalgamating both internal knowledge and external chemistry data, thus combining the benefits of efficient optimization and molecular generation. With the increasing potential of LLMs, both in their capacity as pretrained foundational models and new strategies in post-training inference, we anticipate broad applications of LLM-based evolutionary optimization in the fields of chemistry and materials design.
Chemistry
What problem does this paper attempt to address?
Several key issues in the design of functional metal complexes (TMCs) that this paper attempts to solve are as follows: 1. **Combination explosion problem**: Due to the large design space resulting from the combinations of various metals and ligands, it is difficult for traditional methods to systematically explore this space. Especially for transition metal complexes (TMCs), their design is restricted by the diversity of metals and ligands. 2. **Multi - objective optimization challenges**: When designing functional materials, it is usually necessary to optimize multiple target properties simultaneously, such as catalytic activity, stability, etc. This requires that the designed TMCs not only have good electronic structure characteristics (for example, a large HOMO - LUMO energy gap), but also need to possess other performance indicators such as specific polarizability. Traditional genetic algorithms (GAs) face difficulties in dealing with such multi - objective optimization problems, especially in how to balance the relationships between different objectives. 3. **Knowledge transfer and sharing problems**: How to effectively transfer and share existing knowledge between different optimization tasks is a challenge. Traditional GAs lack an effective mechanism to utilize the information in historical data to improve the optimization efficiency of new tasks. To solve the above problems, this research introduced a new evolutionary optimization framework - LLM - EO (Large Language Model - driven Evolutionary Optimization). By combining large language models (LLMs) with evolutionary optimization algorithms, LLM - EO can not only utilize the chemical knowledge embedded within LLMs, but also learn from all the data accumulated in the historical optimization process without supervised fine - tuning, thus performing well in both single - objective and multi - objective optimization tasks. Specifically, LLM - EO can identify 8 out of the top 20 TMCs with the largest HOMO - LUMO energy gap from a design space of 1.37 million TMCs with only 200 candidate TMCs evaluated, demonstrating its efficient search ability and strong adaptability to complex chemical spaces. In addition, through the flexible application of natural language instructions, LLM - EO has shown unprecedented flexibility in multi - objective optimization tasks, reducing the dependence on complex mathematical formulas and being able to generate brand - new ligands and TMCs, accelerating the optimization process.