Concentration Division for Adsorption Coefficient Prediction Using Machine Learning with Abraham Descriptors: Data-Splitting Approach Comparison and Critical Factors Identification

Zhenguo Qi,Shifa Zhong,Xin Huang,Yucui Xu,Haoze Zhang,Baoyou Shi
DOI: https://doi.org/10.1016/j.carbon.2024.119573
IF: 10.9
2024-01-01
Carbon
Abstract:Machine learning (ML) including Abraham descriptors from polyparameter linear free energy relationships (pp-LFERs) has been a popular method for the adsorption coefficient (Kd) prediction. However, Abraham descriptors from pp-LFERs are concentration-dependent and the significance of these descriptors can change over different adsorbate concentration. Ignoring concentration effects on adsorption process and Kd prediction will hinder the understanding of interactions among solutes, solvents and adsorbents at different equilibrium concentration (Ce) range. Therefore, our study first systematically investigated the concentration effects on micropollutants adsorption to carbon-based adsorbents using ML with Abraham descriptors. Concentration-selection approach, as a new data-splitting approach, divided the whole dataset according to the different Ce range. This concentration-selection approach performed better than the data-splitting approach used in previous studies. After the ML models were built in different Ce subsets, Shapley values were calculated to quantify input descriptors contributions. The results indicated specific surface area (BET) was the only critical factor when Ce was in the highest range. The importance of Abraham descriptors increased gradually when Ce decreased. Total pore volume (Vt) was far less important feature than BET for Kd prediction. Critical factors identified at different Ce range for Kd prediction provide a guidance for novel carbon-based adsorbents design.
What problem does this paper attempt to address?