Machine learning in materials research: developments over the last decade and challenges for the future

Anubhav Jain
DOI: https://doi.org/10.26434/chemrxiv-2024-x6spt
2024-02-26
Abstract:The number of studies that apply machine learning (ML) to materials science has been growing at a rate of approximately 1.67 times per year over the past decade. In this review, I examine this growth in various contexts. First, I present an analysis of the most commonly used tools (software, databases, materials science methods, and ML methods) used within papers that apply ML to materials science. The analysis demonstrates that despite the growth of deep learning techniques, the use of classical machine learning is still dominant as a whole. It also demonstrates how new research can effectively build upon past research, particular in the domain of ML models trained on density functional theory calculation data. Next, I present the progression of best scores as a function of time on the matbench materials science benchmark for formation enthalpy prediction. In particular, a dramatic improvement of 7 times reduction in error is obtained when progressing from feature-based methods that use conventional ML (random forest, support vector regression, etc.) to the use of graph neural network techniques. Finally, I provide views on future challenges and opportunities, focusing on data size and complexity, extrapolation, interpretation, access, and relevance.
Chemistry
What problem does this paper attempt to address?
This paper reviews the development of machine learning (ML) applications in materials science research over the past decade and discusses the challenges ahead. The number of ML applications in materials science research has grown at an approximate annual rate of 1.67 times over the past decade. The authors analyzed the most commonly used tools (software, databases, materials science methods, and ML methods) and found that while deep learning techniques have developed rapidly, traditional machine learning still dominates. The research also demonstrated progress from feature-based methods to graph neural network techniques through the matbench materials science benchmark test, reducing errors by approximately 7 times. The paper is divided into three parts: the first part analyzes cross-referencing in different fields, demonstrating how machine learning builds on previous work; the second part quantitatively analyzes the progress of structure-property prediction performance, showcasing the rapid development of ML in materials science; the third part discusses future challenges, including data size and complexity, extrapolation, interpretability, accessibility, and relevance. The authors point out that ML research in materials science can quickly build on previous work, such as utilizing existing databases, software libraries, and materials science methods. The most commonly cited software found in the research is scikit-learn, followed by datasets used for density functional theory calculations. Graph neural networks have made significant progress in performance improvement for structure-property prediction tasks. Future challenges include the issues of data volume and complexity, requiring better datasets and techniques for handling small data; evaluation and improvement of extrapolation capabilities; enhancement of model interpretability to enhance physical insights; and accessibility and relevance issues, including the accessibility of large-scale language models and the reproducibility of scientific findings. Additionally, as model performance improves, it is important to ensure that these models serve practical scientific goals rather than solely pursuing high scores.