Abstract:For over a decade, machine learning (ML) models have been making strides in computer vision and natural language processing (NLP), demonstrating high proficiency in specialized tasks. The emergence of large-scale language and generative image models, such as ChatGPT and Stable Diffusion, has significantly broadened the accessibility and application scope of these technologies. Traditional predictive models are typically constrained to mapping input data to numerical values or predefined categories, limiting their usefulness beyond their designated tasks. In contrast, contemporary models employ representation learning and generative modeling, enabling them to extract and encode key insights from a wide variety of data sources and decode them to create novel responses for desired goals. They can interpret queries phrased in natural language to deduce the intended output. In parallel, the application of ML techniques in materials science has advanced considerably, particularly in areas like inverse design, material prediction, and atomic modeling. Despite these advancements, the current models are overly specialized, hindering their potential to supplant established industrial processes. Materials science, therefore, necessitates the creation of a comprehensive, versatile model capable of interpreting human-readable inputs, intuiting a wide range of possible search directions, and delivering precise solutions. To realize such a model, the field must adopt cutting-edge representation, generative, and foundation model techniques tailored to materials science. A pivotal component in this endeavor is the establishment of an extensive, centralized dataset encompassing a broad spectrum of research topics. This dataset could be assembled by crowdsourcing global research contributions and developing models to extract data from existing literature and represent them in a homogenous format. A massive dataset can be used to train a central model that learns the underlying physics of the target areas, which can then be connected to a variety of specialized downstream tasks. Ultimately, the envisioned model would empower users to intuitively pose queries for a wide array of desired outcomes. It would facilitate the search for existing data that closely matches the sought-after solutions and leverage its understanding of physics and material-behavior relationships to innovate new solutions when pre-existing ones fall short.

1.5 million materials narratives generated by chatbots

1.5 million materials narratives generated by chatbots

MatChat: A Large Language Model and Application Service Platform for Materials Science

Agent-based Learning of Materials Datasets from Scientific Literature

ChatGPT in the Material Design: Selected Case Studies to Assess the Potential of ChatGPT

Generative artificial intelligence and its applications in materials science: Current situation and future perspectives

ChatGPT for Computational Materials Science: A Perspective

Advancing materials science through next-generation machine learning

Challenges and Limitations of ChatGPT and Artificial Intelligence for Scientific Research: A Perspective from Organic Materials

AlphaMat: A Material Informatics Hub Connecting Data, Features, Models and Applications

ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models

MatGPT: A Vane of Materials Informatics from Past, Present, to Future

Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning

Artificial Intelligence Agents for Materials Sciences

Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT

ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks

Interdisciplinary Discovery of Nanomaterials Based on Convolutional Neural Networks

Distinguishing Chatbot from Human

A Survey of Datasets, Preprocessing, Modeling Mechanisms, and Simulation Tools Based on AI for Material Analysis and Discovery

NLP meets Materials Science: Quantifying the presentation of materials data in scientific literature