Classification of battery compounds using structure-free Mendeleev encodings

Zixin Zhuang,Amanda S. Barnard
DOI: https://doi.org/10.1186/s13321-024-00836-x
2024-04-28
Journal of Cheminformatics
Abstract:Machine learning is a valuable tool that can accelerate the discovery and design of materials occupying combinatorial chemical spaces. However, the prerequisite need for vast amounts of training data can be prohibitive when significant resources are needed to characterize or simulate candidate structures. Recent results have shown that structure-free encoding of complex materials, based entirely on chemical compositions, can overcome this impediment and perform well in unsupervised learning tasks. In this study, we extend this exploration to supervised classification, and show how structure-free encoding can accurately predict classes of material compounds for battery applications without time consuming measurement of bonding networks, lattices or densities.
chemistry, multidisciplinary,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?
The paper aims to address the problem of classifying battery compounds based on their chemical compositions in materials science. Traditional machine learning methods require a large amount of training data, which can be time-consuming when obtaining or simulating candidate structures. The researchers propose an unstructured encoding method that relies solely on chemical compositions for unsupervised learning tasks. The paper further extends this method and applies it to supervised classification tasks to predict the material compound categories for battery applications without measuring complex bonding networks, lattices, or densities. The main contributions of the paper include: 1. A comprehensive evaluation of unstructured encodings (such as one-hot encoding, Mendeleev encoding, and combined Mendeleev encoding) in classification tasks, including binary and multiclass classification, using three classifiers with different logics and four evaluation metrics. 2. Application of these encodings to experimental and computational datasets, confirming the applicability and superiority of Mendeleev encoding through various visualization methods. 3. Demonstrating how Mendeleev encoding accurately predicts battery material categories without considering bonding, symmetry, density, or disorder, simplifying the material design process. The research methodology involves using logistic regression, decision trees, and support vector machines as classification algorithms to compare the effects of different encodings. The performance and generalization ability of the models are evaluated using techniques such as learning curves, classification reports, confusion matrices, and ROC curves. The experimental results show that Mendeleev encoding performs well in classification tasks, especially in classifying battery materials, achieving high accuracy predictions even without structural information. This indicates that encoding based on elemental composition can accelerate the discovery and design of new materials, particularly in the research of energy storage systems such as batteries.