Abstract:The field of catalysis holds paramount importance in shaping the trajectory of sustainable development, prompting intensive research efforts to leverage artificial intelligence (AI) in catalyst design. Presently, the fine-tuning of open-source large language models (LLMs) has yielded significant breakthroughs across various domains such as biology and healthcare. Drawing inspiration from these advancements, we introduce CataLM Cata}lytic Language Model), a large language model tailored to the domain of electrocatalytic materials. Our findings demonstrate that CataLM exhibits remarkable potential for facilitating human-AI collaboration in catalyst knowledge exploration and design. To the best of our knowledge, CataLM stands as the pioneering LLM dedicated to the catalyst domain, offering novel avenues for catalyst discovery and development.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use large - language models (LLMs) to promote the development in the field of catalyst design, especially in electrocatalytic materials. Specifically, the authors have developed a large - language model named CataLM, aiming to overcome the deficiencies of existing models in catalyst knowledge extraction and understanding. Through pre - training and instruction fine - tuning in the field of electrocatalytic materials, CataLM can better understand and process text data related to catalysts, thus providing scientists with a more effective tool for catalyst knowledge exploration and design. ### Main problems: 1. **Complexity and diversity of catalyst knowledge**: The design of catalysts involves multiple variables such as synthesis, composition, structure and performance. This information is scattered in a large number of scientific literatures, and it is difficult to extract useful information from them. 2. **Limitations of existing large - language models**: Although existing large - language models perform well in general fields, they lack sufficient expertise in the catalyst field and cannot meet specific requirements. 3. **Data scarcity and annotation difficulties**: High - quality data sets in the catalyst field are relatively scarce, and they need to be annotated by experts, which increases the difficulty of model training. ### Solutions: - **Development of CataLM**: Based on the Vicuna - 13B model, through two stages of domain pre - training and instruction fine - tuning, the model has a deep understanding of the field of electrocatalytic materials. - **Domain pre - training**: Use a large amount of literature data in the field of electrocatalytic materials for pre - training, so that the model can learn professional terms and knowledge related to catalysts. - **Instruction fine - tuning**: Fine - tune through the data set annotated by experts to further improve the performance of the model on specific tasks, such as entity recognition and control method recommendation. - **Evaluation and verification**: Through experiments on entity recognition and control method recommendation tasks, the effectiveness of CataLM is verified, and its potential in catalyst design is demonstrated. ### Goals: - Provide a powerful tool to help scientists conduct catalyst design and research more efficiently. - Promote the collaboration between humans and AI and accelerate innovation and development in the catalyst field. Through these efforts, CataLM is expected to bring new possibilities for catalyst design and promote the progress of sustainable development - related technologies.

CataLM: Empowering Catalyst Design Through Large Language Models

Automation and Machine Learning Augmented by Large Language Models in Catalysis Study

Open Challenges in Developing Generalizable Large Scale Machine Learning Models for Catalyst Discovery

Machine-learning-accelerated Discovery of Single-Atom Catalysts Based on Bidirectional Activation Mechanism

Open Challenges in Developing Generalizable Large-Scale Machine-Learning Models for Catalyst Discovery

An Artificial Intelligence (AI) workflow for catalyst design and optimization

Large language model enhanced corpus of CO 2 reduction electrocatalysts and synthesis procedures

Revisiting Electrocatalyst Design by a Knowledge Graph of Cu-Based Catalysts for CO 2 Reduction

Interpretable Machine Learning for Catalytic Materials Design toward Sustainability

Catalytic Large Atomic Model (CLAM): A Machine-Learning-Based Interatomic Potential Universal Model

Integrating Machine Learning and Large Language Models to Advance Exploration of Electrochemical Reactions

Integrating Machine Learning and Large Language Models to Advance Wu Exploration of Electrochemical Reactions

A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery

Unlocking New Insights for Electrocatalyst Design: A Unique Data Science Workflow Leveraging Internet-Sourced Big Data

Machine Learning Interatomic Potentials for Catalysis

How Machine Learning Can Accelerate Electrocatalysis Discovery and Optimization

Machine Learning Accelerating Innovative Researches on Energy and Environmental Catalysts

Toward Excellence of Electrocatalyst Design by Emerging Descriptor‐Oriented Machine Learning

Machine Learning for Catalysis Informatics: Recent Applications and Prospects

Machine learning enabled rational design of atomic catalysts for electrochemical reactions