Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

Yanpeng Ye,Jie Ren,Shaozhou Wang,Yuwei Wan,Haofen Wang,Imran Razzak,Bram Hoex,Tong Xie,Wenjie Zhang
2024-09-30
Abstract:Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges for efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science has opened avenues for accelerating the discovery process, though it also demands precise annotation, data extraction, and traceability of information. To tackle these issues, this article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques, integrated with large language models to extract and systematically organize a decade's worth of high-quality research into structured triples, contains 162,605 nodes and 731,772 edges. MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology, thus enhancing data usability and integration. By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods. This structured approach not only streamlines materials research but also lays the groundwork for more sophisticated science knowledge graphs.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issues of knowledge dispersion and the difficulty of efficient integration in materials science research. Specifically: 1. **Knowledge Dispersion**: The knowledge in materials science is widely distributed across a vast amount of scientific literature, making the discovery and integration of new materials very challenging. Traditional research methods often rely on time-consuming and costly experimental means, further hindering rapid innovation. 2. **Data Extraction and Annotation**: Although the combination of artificial intelligence and materials science offers new pathways to accelerate the discovery process, this approach requires precise data extraction and annotation, as well as traceability of information. Existing databases, while capable of storing structured data and handling basic queries, have limitations in capturing complex relationships and inferring new knowledge from the data. 3. **Interdisciplinary Collaboration**: The research field of materials science is highly specialized, and researchers working in one direction often find it difficult to efficiently access and understand material knowledge from other fields. For example, researchers working on solar cells may not fully understand the related research on solid-state batteries or organic light-emitting diodes, even though the electronic properties of these different fields are highly related. 4. **Challenges of Existing Knowledge Graphs**: Existing materials knowledge graphs (such as the MatKG series) have made some progress but still face several challenges. Firstly, training data still requires extensive annotation to improve model accuracy; secondly, predicting relationships between nodes means that entities in the graph are not always based on real instances, which can affect the authenticity and credibility of the graph; additionally, dynamically updating the graph is also difficult, especially in a rapidly developing field like materials science. To address these issues, this paper introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques combined with large-scale language models to extract and systematically organize structured triples from a decade of high-quality research. MKG, through network algorithms, not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods, thereby simplifying the materials research process and laying the foundation for more complex scientific knowledge graphs.