Composition and structure analyzer/featurizer for explainable machine-learning models to predict solid state structures

Sangjoon Lee,Anton Oliynyk,Emil Jaffal,Danila Shiryaev,Alex Vtorov,Nikhil Barua,Holger Kleinke
DOI: https://doi.org/10.26434/chemrxiv-2024-rrbhc
2024-10-15
Abstract:Traditional and non-classical machine learning models for solid-state structure prediction have predominantly relied on compositional features (derived from properties of constituent elements) to predict the existence of structure and its properties. However, the lack of structural information can be a source of suboptimal property mapping and increased predictive uncertainty. To address the challenge, we introduce a strategy that generates and combines both compositional and structural features with minimal programming expertise required. Our approach utilizes open-source, interactive Python programs named Composition Analyzer Featurizer (CAF) and Structure Analyzer Featurizer (SAF). CAF generates numerical compositional features from a list of formulas provided in an Excel file, while SAF extracts numerical structural features from a .cif file by generating a supercell. 133 features from CAF and 94 features from SAF were used either individually or in combination to cluster nine structure types in equiatomic AB intermetallics. The performance was comparable to those with features state-of-the art featurizers in advanced machine learning models. Our SAF+CAF features provided a cost-efficient and reliable solution, even with the PLS-DA method, where a significant fraction of the most contributing features were the same as those identified in the more computationally intensive XGBoost models.
Chemistry
What problem does this paper attempt to address?