Abstract:We propose MatSci ML, a novel benchmark for modeling MATerials SCIence using Machine Learning (MatSci ML) methods focused on solid-state materials with periodic crystal structures. Applying machine learning methods to solid-state materials is a nascent field with substantial fragmentation largely driven by the great variety of datasets used to develop machine learning models. This fragmentation makes comparing the performance and generalizability of different methods difficult, thereby hindering overall research progress in the field. Building on top of open-source datasets, including large-scale datasets like the OpenCatalyst, OQMD, NOMAD, the Carolina Materials Database, and Materials Project, the MatSci ML benchmark provides a diverse set of materials systems and properties data for model training and evaluation, including simulated energies, atomic forces, material bandgaps, as well as classification data for crystal symmetries via space groups. The diversity of properties in MatSci ML makes the implementation and evaluation of multi-task learning algorithms for solid-state materials possible, while the diversity of datasets facilitates the development of new, more generalized algorithms and methods across multiple datasets. In the multi-dataset learning setting, MatSci ML enables researchers to combine observations from multiple datasets to perform joint prediction of common properties, such as energy and forces. Using MatSci ML, we evaluate the performance of different graph neural networks and equivariant point cloud networks on several benchmark tasks spanning single task, multitask, and multi-data learning scenarios. Our open-source code is available at <a class="link-external link-https" href="https://github.com/IntelLabs/matsciml" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper aims to address the challenges faced in modeling solid materials using machine learning methods in materials science, particularly for solid materials with periodic crystal structures. Specifically, the main objectives of the study include: 1. **Establishing a comprehensive benchmark dataset (MatSci ML)**: This benchmark dataset integrates various open-source datasets, such as OpenCatalyst, OQMD, NOMAD, Carolina Materials Database, and Materials Project, to provide diverse material systems and their property data for training and evaluating machine learning models. 2. **Supporting multi-task training**: MatSci ML includes support for multiple regression and classification tasks, allowing researchers to utilize multi-task training methods in modeling solid materials based on graph and point cloud representations. 3. **Achieving multi-dataset integration**: MatSci ML enables machine learning models to perform joint training on heterogeneous data from different datasets, which helps in developing more general, efficient, and accurate machine learning models and methods to handle solid materials. Through these efforts, the paper aims to address several key issues present in current research: - The differences in datasets used by different studies make it difficult to compare the performance and generalization capabilities of different methods, thereby hindering the progress of the entire field. - Existing methods often focus only on specific properties (such as energy and force prediction), which limits their utility in practical applications. - There is a lack of benchmark tests that can comprehensively evaluate the modeling capabilities of machine learning models for solid materials, especially datasets that cover a broader range of material systems and their properties. In summary, the goal of this paper is to promote the development of machine learning methods in the field of solid materials science by creating a comprehensive benchmark dataset, MatSci ML, and to improve the generality and practicality of the models.

MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials Modeling