Dielectric Ceramics Database Automatically Constructed by Data Mining in the Literature

Xiaochao Wang,Wanli Zhang,Wenxu Zhang
DOI: https://doi.org/10.1021/acs.jcim.4c00282
IF: 6.162
2024-01-01
Journal of Chemical Information and Modeling
Abstract:Vast published dielectric ceramics literature is a natural database for big-data analysis, discovering structure-property relationships, and property prediction. We constructed a data-mining pipeline based on natural language processing (NLP) to extract property information from about 12,900 published dielectric ceramics articles and normalized more than 20 properties. The micro-F1 scores for sentence classification, named entities recognition, relation extraction (related), and relation extraction (same), are 91.6, 82.4, 91.4, and 88.3%, respectively. We demonstrated the distribution of some essential properties according to the publication years to reveal the tendency. In order to test the reliability of the data extraction, we trained an XGBoost model to predict the dielectric constant and used the SHAP module to interpret the contribution of each feature in order to identify some of the factors that determine the dielectric properties. The result shows that including Q x f in the model can increase the dielectric constant prediction accuracy. Our work can give some hints to experimentalists on their way to improve the performances of cutting-edge materials.
What problem does this paper attempt to address?