Abstract:Self-supervised neural language models have recently achieved unprecedented success, from natural language processing to learning the languages of biological sequences and organic molecules. These models have demonstrated superior performance in the generation, structure classification, and functional predictions for proteins and molecules with learned representations. However, most of the masking-based pre-trained language models are not designed for generative design, and their black-box nature makes it difficult to interpret their design logic. Here we propose BLMM Crystal Transformer, a neural network based probabilistic generative model for generative and tinkering design of inorganic materials. Our model is built on the blank filling language model for text generation and has demonstrated unique advantages in learning the "materials grammars" together with high-quality generation, interpretability, and data efficiency. It can generate chemically valid materials compositions with as high as 89.7\% charge neutrality and 84.8\% balanced electronegativity, which are more than 4 and 8 times higher compared to a pseudo random sampling baseline. The probabilistic generation process of BLMM allows it to recommend tinkering operations based on learned materials chemistry and makes it useful for materials doping. Combined with the TCSP crysal structure prediction algorithm, We have applied our model to discover a set of new materials as validated using DFT calculations. Our work thus brings the unsupervised transformer language models based generative artificial intelligence to inorganic materials. A user-friendly web app has been developed for computational materials doping and can be accessed freely at \url{<a class="link-external link-http" href="http://www.materialsatlas.org/blmtinker" rel="external noopener nofollow">this http URL</a>}.

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Crystal Composition Transformer: Self-Learning Neural Language Model for Generative and Tinkering Design of Materials

Crystal Transformer: Self-learning neural language model for Generative and Tinkering Design of Materials

Language models can generate molecules, materials, and protein binding sites directly in three dimensions as XYZ, CIF, and PDB files

Large language models design sequence-defined macromolecules via evolutionary optimization

Explainable Synthesizability Prediction of Inorganic Crystal Polymorphs using Large Language Models

Assessment of Fine-Tuned Large Language Models for Real-World Chemistry and Material Science Applications

Explainable Synthesizability Prediction of Inorganic Crystal Structures using Large Language Models

Large Language Models as Molecular Design Engines

Large Language Models for Inorganic Synthesis Predictions

Data-Driven Score-Based Models for Generating Stable Structures with Adaptive Crystal Cells

Fine-tuning Large Language Models for Chemical Text Mining

Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures?

CrysText: A Generative AI Approach for Text-Conditioned Crystal Structure Generation using LLM

Atom-by-atom protein generation and beyond with language models

Large Language Model-Guided Prediction Toward Quantum Materials Synthesis

Crystal Structure Generation with Autoregressive Large Language Modeling

Atomic structure generation from reconstructing structural fingerprints

Generative Hierarchical Materials Search

MatterGen: a generative model for inorganic materials design