A Case Study of Multimodal, Multi-institutional Data Management for the Combinatorial Materials Science Community

Sarah I. Allec,Eric S. Muckley,Nathan S. Johnson,Christopher K. H. Borg,Dylan J. Kirsch,Joshua Martin,Rohit Pant,Ichiro Takeuchi,Andrew S. Lee,James E. Saal,Logan Ward,Apurva Mehta
DOI: https://doi.org/10.1007/s40192-024-00345-7
2024-03-22
Integrating Materials and Manufacturing Innovation
Abstract:Although the convergence of high-performance computing, automation, and machine learning has significantly altered the materials design timeline, transformative advances in functional materials and acceleration of their design will require addressing the deficiencies that currently exist in materials informatics, particularly a lack of standardized experimental data management. The challenges associated with experimental data management are especially true for combinatorial materials science, where advancements in automation of experimental workflows have produced datasets that are often too large and too complex for human reasoning. The data management challenge is further compounded by the multimodal and multi-institutional nature of these datasets, as they tend to be distributed across multiple institutions and can vary substantially in format, size, and content. Furthermore, modern materials engineering requires the tuning of not only composition but also of phase and microstructure to elucidate processing–structure–property–performance relationships. To adequately map a materials design space from such datasets, an ideal materials data infrastructure would contain data and metadata describing (i) synthesis and processing conditions, (ii) characterization results, and (iii) property and performance measurements. Here, we present a case study for the low-barrier development of such a dashboard that enables standardized organization, analysis, and visualization of a large data lake consisting of combinatorial datasets of synthesis and processing conditions, X-ray diffraction patterns, and materials property measurements generated at several different institutions. While this dashboard was developed specifically for data-driven thermoelectric materials discovery, we envision the adaptation of this prototype to other materials applications, and, more ambitiously, future integration into an all-encompassing materials data management infrastructure.
materials science, multidisciplinary,engineering, manufacturing
What problem does this paper attempt to address?
This paper discusses the challenges of managing multimodal and multi-institutional data in the field of materials science, particularly in composite materials science. With the integration of high-performance computing, automation, and machine learning, the timeline for material design has been significantly shortened. However, in order to achieve breakthrough progress and accelerate material design, the issues in materials informatics need to be addressed, especially the lack of standardized experimental data management. The study indicates that the challenge of experimental data management is particularly prominent in composite materials science. The data generated from automated experimental workflows are large and complex, surpassing human comprehension. These data are typically distributed across multiple institutions, with varying formats, sizes, and contents. Moreover, modern materials engineering requires consideration of composition, phase, and microstructure modulation to reveal the processing-structure-property-performance relationships. An ideal material data infrastructure should include data and metadata of synthesis and processing conditions, characterization results, and performance measurements. The paper presents a case study of a low-barrier development data dashboard that standardizes the organization, analysis, and visualization of composite datasets generated by different institutions. It includes data on synthesis and processing conditions, X-ray diffraction patterns, and material performance measurements. This dashboard was initially designed for data-driven thermoelectric material discovery but is expected to adapt to other material applications and eventually integrate into a comprehensive material data management infrastructure. The paper also discusses the importance of data standardization, persistence, and cross-institutional access, as well as the use of Globus tools for file storage and user authentication. The web-based dashboard developed in the study features a graphical user interface that supports data upload, organization, processing, analysis, and visualization, reducing the difficulty of managing multimodal data and improving collaboration efficiency. In conclusion, this paper aims to address the challenges of data management in materials science by establishing a standardized data infrastructure to facilitate the integration, analysis, and sharing of experimental data, thereby accelerating the design and discovery of new materials.