GPTArticleExtractor: An Automated Workflow for Magnetic Material Database Construction

Yibo Zhang,Suman Itani,Kamal Khanal,Emmanuel Okyere,Gavin Smith,Koichiro Takahashi,Jiadong Zang
2024-01-11
Abstract:A comprehensive database of magnetic materials is valuable for researching the properties of magnetic materials and discovering new ones. This article introduces a novel workflow that leverages large language models for extracting key information from scientific literature. From 22,120 articles in the Journal of Magnetism and Magnetic Materials, a database containing 2,035 magnetic materials was automatically generated, with ferromagnetic materials constituting 76% of the total. Each entry in the database includes the material's chemical compounds, as well as related structures (space group, crystal structure) and magnetic temperatures (Curie, N'eel, and other transitional temperatures). To ensure data accuracy, we meticulously compared each entry in the database against the original literature, verifying the precision and reliability of each entry.
Materials Science
What problem does this paper attempt to address?
The paper attempts to address the problem of constructing a comprehensive magnetic materials database to facilitate the study of magnetic material properties and the discovery of new magnetic materials. Specifically, the paper introduces a new workflow that utilizes large language models (LLMs) to automatically extract key information from scientific literature, thereby generating a database containing 2,035 magnetic materials. The information about these materials includes chemical composition, related structures (space group, crystal structure), and magnetic temperatures (Curie temperature, Néel temperature, etc.). To ensure data accuracy, the researchers carefully compared each entry in the database with the original literature to verify its precision and reliability. ### Main Issues and Solutions: 1. **Lack of a comprehensive magnetic materials database**: - Existing databases either lack information on magnetic properties (such as Curie temperature, Néel temperature) or are too small to support complex deep learning models. - **Solution**: By using an automated workflow, leveraging large language models to extract key information from a vast amount of scientific literature, a database containing 2,035 magnetic materials was constructed. 2. **Accuracy and reliability of data extraction**: - Traditional methods (such as rule-based extraction tools) have limitations in adapting to different writing styles and layouts of articles, which may lead to inaccurate data. - **Solution**: Through meticulous prompt engineering, large language models are guided to accurately extract data, and manual verification ensures data quality. 3. **Data-driven materials discovery**: - High-throughput methods, while efficient in screening materials with specific properties, are limited by the insufficiency of first-principles calculations, especially in predicting the properties of magnetic materials. - **Solution**: By using large language models to extract experimentally validated data, a high-quality database is constructed to support data-driven materials discovery methods. ### Main Contributions of the Paper: - **Automated Workflow**: Developed GPTArticleExtractor, which uses GPT-3.5 and GPT-4 models to automatically extract key information from scientific literature. - **Data Quality Assurance**: Ensured the accuracy and reliability of extracted data through manual verification and prompt engineering. - **Large-scale Database**: Constructed a database containing 2,035 magnetic materials, covering chemical composition, magnetic temperatures, and structural information. Through these methods, the paper provides strong support for the research of magnetic materials, promoting the discovery and application of new magnetic materials.