Probing out-of-distribution generalization in machine learning for materials

Kangming Li,Andre Niyongabo Rubungo,Xiangyun Lei,Daniel Persaud,Kamal Choudhary,Brian DeCost,Adji Bousso Dieng,Jason Hattrick-Simpers

2024-06-11

Abstract:Scientific machine learning (ML) endeavors to develop generalizable models with broad applicability. However, the assessment of generalizability is often based on heuristics. Here, we demonstrate in the materials science setting that heuristics based evaluations lead to substantially biased conclusions of ML generalizability and benefits of neural scaling. We evaluate generalization performance in over 700 out-of-distribution tasks that features new chemistry or structural symmetry not present in the training data. Surprisingly, good performance is found in most tasks and across various ML models including simple boosted trees. Analysis of the materials representation space reveals that most tasks contain test data that lie in regions well covered by training data, while poorly-performing tasks contain mainly test data outside the training domain. For the latter case, increasing training set size or training time has marginal or even adverse effects on the generalization performance, contrary to what the neural scaling paradigm assumes. Our findings show that most heuristically-defined out-of-distribution tests are not genuinely difficult and evaluate only the ability to interpolate. Evaluating on such tasks rather than the truly challenging ones can lead to an overestimation of generalizability and benefits of scaling.

Materials Science

What problem does this paper attempt to address?

The paper primarily explores the generalization ability of machine learning (ML) models in materials science, particularly their performance in handling out-of-distribution (OOD) tasks. Specifically, the paper points out that current methods for evaluating model generalization ability are often based on some simple heuristic rules. These rules may be subjective, vary across different studies, and even lead to misunderstandings about generalization ability. The paper systematically analyzes over 700 OOD tasks, which cover cases where new material chemistry or structural features are not present in the training data. The study finds that various existing machine learning models, including simple boosted trees, perform well in most OOD tasks. However, for those tasks that perform poorly, the test data often lie outside the training data domain. Additionally, the paper finds that increasing the training set size or training time does not significantly improve the generalization performance of these challenging OOD tasks, contrary to what the neural scaling paradigm suggests. In summary, the paper reveals that most heuristic-based OOD tests are not truly difficult; they only assess interpolation ability rather than true extrapolation ability. Therefore, evaluating these tasks may overestimate the generalization ability of models and the benefits of scaling. By analyzing the representation space of materials, the paper further illustrates the distinction between well-performing and poorly-performing tasks and proposes a method to differentiate statistically out-of-distribution data from representationally out-of-distribution data. These findings suggest that the OOD tasks chosen by existing methods may be biased, leading to misunderstandings about model generalization ability.

Probing out-of-distribution generalization in machine learning for materials

A critical examination of robustness and generalizability of machine learning prediction of materials properties

Towards Out-Of-Distribution Generalization: A Survey

Generalization Across Experimental Parameters in Machine Learning Analysis of High Resolution Transmission Electron Microscopy Datasets

A strategy to apply machine learning to small datasets in materials science

Reliable and Explainable Machine Learning Methods for Accelerated Material Discovery

Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future

Identification of high-reliability regions of machine learning predictions in materials science using transparent conducting oxides and perovskites as examples

Data-Driven Materials Discovery and Synthesis using Machine Learning Methods

Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data

The Importance of Generalizability in Machine Learning for Systems

Out-of-Distribution Generalization in Text Classification: Past, Present, and Future

Machine Learning vs Deep Learning: The Generalization Problem

Structure-based out-of-distribution (OOD) materials property prediction: a benchmark study

Certifiable Out-of-Distribution Generalization.

Distance-based Analysis of Machine Learning Prediction Reliability for Datasets in Materials Science and Other Fields

Recent progress on machine learning with limited materials data: Using tools from data science and domain knowledge

A Survey on Evaluation of Out-of-Distribution Generalization

Modeling Generalization in Machine Learning: A Methodological and Computational Study

Quantifying the performance of machine learning models in materials discovery

Generalization Across Experimental Parameters in Neural Network Analysis of High-Resolution Transmission Electron Microscopy Datasets