Crystal Structure Generation with Autoregressive Large Language Modeling

Luis M. Antunes,Keith T. Butler,Ricardo Grau-Crespo

2024-02-13

Abstract:The generation of plausible crystal structures is often the first step in predicting the structure and properties of a material from its chemical composition. Quickly generating and predicting inorganic crystal structures is important for the discovery of new materials, which can target applications such as energy or electronic devices. However, most current methods for crystal structure prediction are computationally expensive, slowing the pace of innovation. Seeding structure prediction algorithms with quality generated candidates can overcome a major bottleneck. Here, we introduce CrystaLLM, a methodology for the versatile generation of crystal structures, based on the autoregressive large language modeling (LLM) of the Crystallographic Information File (CIF) format. Trained on millions of CIF files, CrystaLLM focuses on modeling crystal structures through text. CrystaLLM can produce plausible crystal structures for a wide range of inorganic compounds unseen in training, as demonstrated by ab initio simulations. The integration with predictors of formation energy permits the use of a Monte Carlo Tree Search algorithm to improve the generation of meaningful structures. Our approach challenges conventional representations of crystals, and demonstrates the potential of LLMs for learning effective 'world models' of crystal chemistry, which will lead to accelerated discovery and innovation in materials science.

Materials Science

What problem does this paper attempt to address?

The problem addressed in this paper is: how to efficiently generate crystal structures to accelerate innovative discoveries in materials science? The paper introduces a new method called CrystaLLM, which utilizes a large language model (LLM) to perform autoregressive modeling on Crystallographic Information File (CIF) formats. This method generates crystal structures for various inorganic compounds. Current crystal structure prediction methods are computationally expensive, but CrystaLLM learns crystal structures in text form, providing high-quality candidate structures for structure prediction algorithms, thereby reducing computational demands and improving prediction efficiency. In this research, the authors trained an LLM specifically for crystal generation. The model is trained directly on the text content of millions of CIF files, instead of relying solely on natural language or chemical composition. Experimental results demonstrate that CrystaLLM can generate reasonable crystal structures for various inorganic compounds not encountered during training. Furthermore, combining CrystaLLM with a Monte Carlo tree search algorithm for structure optimization can further enhance structure generation. The paper also compares the performance of CrystaLLM with other machine learning methods in crystal structure prediction. The results show that CrystaLLM performs well in multiple benchmark tests, particularly in terms of matching rate and root mean square error, outperforming other models. Additionally, the paper provides examples of CrystaLLM generating structures for different compound types such as rutiles, spinels, and elpasolites, demonstrating the model's generalization ability when dealing with unseen structures and combinations of elements.

Crystal Structure Generation with Autoregressive Large Language Modeling

CrysText: A Generative AI Approach for Text-Conditioned Crystal Structure Generation using LLM

Data-Driven Score-Based Models for Generating Stable Structures with Adaptive Crystal Cells

Explainable Synthesizability Prediction of Inorganic Crystal Structures using Large Language Models

Explainable Synthesizability Prediction of Inorganic Crystal Polymorphs using Large Language Models

Self-Supervised Generative Models for Crystal Structures

Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures?

3-D Inorganic Crystal Structure Generation and Property Prediction via Representation Learning

Generative Hierarchical Materials Search

Crystal Structure Generation Based On Material Properties

CRYSPNet: Crystal Structure Predictions via Neural Network

A machine learning potential-based generative algorithm for on-lattice crystal structure prediction

Crystal Structure Prediction Using Generative Adversarial Network with Data-Driven Latent Space Fusion Strategy

Exploration of crystal chemical space using text-guided generative artificial intelligence

Accelerated Organic Crystal Structure Prediction with Genetic Algorithms and Machine Learning

Crystal structure prediction with machine learning-based element substitution

A Robust Crystal Structure Prediction Method to Support Small Molecule Drug Development with Large Scale Validation and Prospective Studies

Learning Atoms from Crystal Structure

Efficient Probabilistic Modeling of Crystallization at Mesoscopic Scale

Degenerative changes in fresh aortic root homografts in a canine model: evidence of an immunologic influence.

Unified Model for Crystalline Material Generation