MIC-SHAP: An ensemble feature selection method for materials machine learning

Junya Wang,Pengcheng Xu,Xiaobo Ji,Minjie Li,Wencong Lu
DOI: https://doi.org/10.1016/j.mtcomm.2023.106910
IF: 3.8
2023-08-18
Materials Today Communications
Abstract:Feature selection has kept playing a significant role in the workflow of materials machine learning, but currently most of works of materials machine learning tend to use single or stepwise feature selection methods. A new ensemble feature selection method named MIC-SHAP was proposed in this work, which combines the SHapley Additive exPlanations (SHAP) method and the maximal information coefficient (MIC) method. The effectiveness of the ensemble feature selection method was evaluated with three different material datasets collected from publications. The results have demonstrated that MIC-SHAP method outperforms the commonly used feature selection methods, guaranteeing the prediction accuracy and greatly reducing the model complexity. The highest feature reduction rate is 91.67%, while the R 2 of the 10-fold cross-validation reaches 0.98. The MIC-SHAP method could quickly select the optimal feature subset effectively, avoiding repeated attempts of different feature selection methods. Moreover, the MIC-SHAP method could increase the stability and interpretability of feature selection to help the subsequent process of materials design and discovery. Data availability The datasets and codes in the current study are publicly available on GitHub at junya-wq/efs (github.com). And the data available within the article or its supplementary materials .
materials science, multidisciplinary
What problem does this paper attempt to address?