Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction

Guangxuan Song,Dongmei Fu,Zhongwei Qiu,Zijiang Yang,Jiaxin Dai,Lingwei Ma,Dawei Zhang
2024-04-24
Abstract:Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultaneously handle semantic and numerical information. In this paper, we propose a numerical reasoning method for material KGs (NR-KG), which constructs a cross-modal KG using semantic nodes and numerical proxy nodes. It captures both types of information by projecting KG into a canonical KG and utilizes a graph neural network to predict material properties. In this process, a novel projection prediction loss is proposed to extract semantic features from numerical information. NR-KG facilitates end-to-end processing of cross-modal data, mining relationships and cross-modal information in small-sample datasets, and fully utilizes valuable experimental data to enhance material prediction. We further propose two new High-Entropy Alloys (HEA) property datasets with semantic descriptions. NR-KG outperforms state-of-the-art (SOTA) methods, achieving relative improvements of 25.9% and 16.1% on two material datasets. Besides, NR-KG surpasses SOTA methods on two public physical chemistry molecular datasets, showing improvements of 22.2% and 54.3%, highlighting its potential application and generalizability. We hope the proposed datasets, algorithms, and pre-trained models can facilitate the communities of KG and AI for materials.
Machine Learning,Materials Science
What problem does this paper attempt to address?
The problems that this paper attempts to solve are two major challenges in material property prediction: 1. **Scarcity of high - quality material data**: Since the acquisition of material properties is highly dependent on experiments, and the preparation and testing of materials require a great deal of time and resources, the amount of available data is very limited. 2. **Difficulty in effectively representing and utilizing multi - modal data**: In materials science, cross - modal machine learning (cross - modal ML) requires a large amount of data, which is difficult to obtain. Therefore, existing methods often ignore semantic information (such as material processing techniques, etc.), or fail to effectively integrate numerical and semantic information. Specifically, the paper points out: - Existing methods usually use machine learning (ML) algorithms to regress numerical properties, or transfer other pre - trained knowledge graphs (KGs) to materials, but these methods cannot handle semantic and numerical information simultaneously. - In small - sample datasets, existing methods ignore the relationships between samples and only focus on the relationships between material features and properties. To solve these problems, the authors propose the **Numerical Reasoning Knowledge Graph (NR - KG)** method. The main contributions of NR - KG include: - Constructing a cross - modal knowledge graph (cross - modal KG) that can capture numerical and semantic information simultaneously. - Proposing a novel projection prediction loss for extracting semantic features from numerical information. - Using graph neural networks (GNN) for end - to - end cross - modal data processing to mine relationships and cross - modal information in small - sample datasets. - Proposing two new high - entropy alloy (HEA) property datasets, including semantic descriptions such as hardness and corrosion resistance, enriching the research data. Through these innovations, NR - KG not only performs well in small - sample datasets but also shows good generalization ability on public physicochemical molecular datasets.