Text-to-Battery Recipe: A language modeling-based protocol for automatic battery recipe extraction and retrieval

Daeun Lee,Jaewoong Choi,Hiroshi Mizuseki,Byungju Lee
2024-07-22
Abstract:Recent studies have increasingly applied natural language processing (NLP) to automatically extract experimental research data from the extensive battery materials literature. Despite the complex process involved in battery manufacturing -- from material synthesis to cell assembly -- there has been no comprehensive study systematically organizing this information. In response, we propose a language modeling-based protocol, Text-to-Battery Recipe (T2BR), for the automatic extraction of end-to-end battery recipes, validated using a case study on batteries containing LiFePO4 cathode material. We report machine learning-based paper filtering models, screening 2,174 relevant papers from the keyword-based search results, and unsupervised topic models to identify 2,876 paragraphs related to cathode synthesis and 2,958 paragraphs related to cell assembly. Then, focusing on the two topics, two deep learning-based named entity recognition models are developed to extract a total of 30 entities -- including precursors, active materials, and synthesis methods -- achieving F1 scores of 88.18% and 94.61%. The accurate extraction of entities enables the systematic generation of 165 end-toend recipes of LiFePO4 batteries. Our protocol and results offer valuable insights into specific trends, such as associations between precursor materials and synthesis methods, or combinations between different precursor materials. We anticipate that our findings will serve as a foundational knowledge base for facilitating battery-recipe information retrieval. The proposed protocol will significantly accelerate the review of battery material literature and catalyze innovations in battery design and development.
Computation and Language,Materials Science
What problem does this paper attempt to address?
The main objective of this paper is to propose a protocol based on language models (Text-to-Battery Recipe, abbreviated as T2BR) for automatically extracting battery recipe information from scientific literature. Specifically, this study aims to address the following key issues: 1. **Comprehensive organization of battery manufacturing information**: Although natural language processing (NLP) technology has been applied to automatically extract experimental data from a large number of battery material documents, no research has systematically organized all relevant information from material synthesis to battery assembly in this complex process. 2. **Automatic extraction of end-to-end battery recipes**: To better understand and analyze battery performance, it is necessary to collect complete battery recipe information, i.e., all steps from the synthesis of electrode materials to battery assembly. However, previous studies have typically focused only on limited information, such as the names of battery materials or material synthesis recipes, without covering the entire process. 3. **Establishing a battery recipe database**: By automatically extracting end-to-end battery recipes, a database containing these detailed recipe information can be constructed, providing a foundational knowledge base for subsequent battery design and development. To address the above issues, the paper adopts the following methods: - **Machine learning models to screen relevant literature**: First, machine learning models are used to screen the collected literature to ensure that the selected documents indeed contain relevant information about battery recipes. - **Topic modeling to identify key paragraphs**: Unsupervised topic modeling techniques (e.g., Latent Dirichlet Allocation, LDA) are used to identify paragraphs related to electrode material synthesis and battery assembly. - **Named Entity Recognition (NER) models to extract key information**: A named entity recognition model based on pre-trained language models is developed to extract specific entity information from the selected paragraphs, including precursors, active materials, synthesis methods, etc. - **Generating end-to-end battery recipes**: Based on the extracted entity information and synthesis action information, a series of sequences describing the electrode material synthesis and battery assembly processes are generated, resulting in 165 end-to-end battery recipes. In summary, the goal of this paper is to fill the gaps in existing research by proposing a new protocol, thereby enabling a more comprehensive and systematic understanding of the battery manufacturing process and providing support for future battery research and development.