UniMat: Unifying Materials Embeddings through Multi-modal Learning

Janghoon Ock,Joseph Montoya,Daniel Schweigert,Linda Hung,Santosh K. Suram,Weike Ye
2024-11-13
Abstract:Materials science datasets are inherently heterogeneous and are available in different modalities such as characterization spectra, atomic structures, microscopic images, and text-based synthesis conditions. The advancements in multi-modal learning, particularly in vision and language models, have opened new avenues for integrating data in different forms. In this work, we evaluate common techniques in multi-modal learning (alignment and fusion) in unifying some of the most important modalities in materials science: atomic structure, X-ray diffraction patterns (XRD), and composition. We show that structure graph modality can be enhanced by aligning with XRD patterns. Additionally, we show that aligning and fusing more experimentally accessible data formats, such as XRD patterns and compositions, can create more robust joint embeddings than individual modalities across various tasks. This lays the groundwork for future studies aiming to exploit the full potential of multi-modal data in materials science, facilitating more informed decision-making in materials design and discovery.
Machine Learning,Materials Science
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper "UniMat: Unifying Material Embeddings through Multimodal Learning" aims to address the challenges of data heterogeneity and multimodal information integration in materials science research. Specifically, the paper focuses on the following key issues: 1. **Enhancing Single-Modality Prediction Capability**: - Can the prediction capability of a single modality (such as crystal structure images) be improved by integrating information from other modalities? For example, by aligning X-ray diffraction (XRD) patterns with crystal structure images, can lattice lengths and angles be predicted more accurately? 2. **Integration of Experimentally Accessible Modalities**: - Can information that typically requires simulation (such as crystal structure) be obtained by integrating experimentally accessible data formats (such as XRD patterns and composition)? This is particularly important in practical applications, as directly obtaining certain information (like crystal structure) experimentally is often challenging. 3. **Complementary Role of Weakly Sensitive Modalities**: - Can the prediction performance be significantly improved by aligning and integrating modalities that are less sensitive to certain features? For example, while composition information alone may have weak predictive power for structural information, can its combination with XRD patterns significantly improve prediction outcomes? ### Research Background Materials science datasets are inherently heterogeneous and are usually represented in different modalities, such as characterization spectra, atomic structures, microscopic images, and text-based synthesis conditions. Each modality has varying sensitivity to different material properties. To comprehensively understand materials, it is necessary to integrate information across multiple modalities. However, due to the heterogeneity of data and embedded information, this integration has been a challenge. ### Solution The paper proposes a multimodal artificial intelligence method—UniMat, which achieves unified material embeddings through the following steps: 1. **Modality Alignment**: - Using contrastive loss functions (such as the contrastive loss in CLIP) to align embeddings of different modalities, increasing the similarity of embeddings for the same material while ensuring that embeddings for different materials are as dissimilar as possible. 2. **Modality Fusion**: - Achieving modality fusion by concatenating embeddings from different modalities and further encoding them using a multi-layer perceptron (MLP). A simple concatenation method was used as a baseline in the study. ### Experimental Results - **Alignment Effect**: - After aligning structure images with XRD patterns, the prediction error for lattice lengths decreased from 0.20 Å to 0.14 Å, and the prediction error for lattice angles remained at 4.5°, comparable to the prediction performance using only XRD patterns. - **Fusion Effect**: - After fusing XRD patterns and composition information, the prediction error for lattice parameters significantly decreased, with the MAE for lattice lengths being 0.13 Å and the MAE for lattice angles being 4.3°. The classification accuracy reached 85%, significantly outperforming the prediction results using composition information alone. ### Conclusion Through multimodal learning, the UniMat method can effectively enhance the prediction capability of single modalities, integrate experimentally accessible modality information, and leverage the complementary role of weakly sensitive modalities. These achievements provide new avenues for accelerating material discovery, particularly in addressing major challenges in fields such as environment, energy, and security.