Modeling hydrogen solubility in water: Comparison of adaptive boosting support vector regression, gene expression programming, and cubic equations of state

Qichao Lv,Tongke Zhou,Haimin Zheng,Behnam Amiri-Ramsheh,Fahimeh Hadavimoghaddam,Abdolhossein Hemmati-Sarapardeh,Xiaochen Li,Longxuan Li
DOI: https://doi.org/10.1016/j.ijhydene.2023.12.227
IF: 7.2
2024-01-18
International Journal of Hydrogen Energy
Abstract:Predicting the solubility of hydrogen (H 2 ) in aqueous solutions is crucial for studying reactions of hydrogen in the formation, which also affects the security and optimal design of hydrogen storage. In this research, five robust machine learning (ML) algorithms, namely adaptive boosting decision tree (AdaBoost-DT), adaptive boosting support vector regression (AdaBoost-SVR), gradient boosting decision tree (GB-DT), gradient boosting support vector regression (GB-SVR), and k-nearest neighbors (KNN) and three powerful white-box techniques, namely gene expression programming (GEP), genetic programming (GP), and group method of data handling (GMDH) were developed to accurately predict H 2 solubility in pure and saline water systems. To this aim, a widespread databank containing 427 experimental data points was collected, and temperature, pressure, and salt concentration (mSalt) were considered as input variables. The validity and precision of the developed models were assessed utilizing several statistical and graphical tests. Results demonstrate that the AdaBoost-SVR smart model could obtain a superior performance and provides precise predictions with root mean square error (RMSE) of 0.000115 and determination coefficient (R 2 ) of 0.9973. Among the white-box models, the GEP provided the best results with an RMSE of 0.000362 and an R 2 of 0.9542. Although the accuracy of GEP is slightly lower than that of AdaBoost-SVR, it offers explicit and simple mathematical formula for calculating H 2 solubility, which is the main advantage of white box models. The results also demonstrated that AdaBoost-SVR outperforms cubic equations of state (EOSs) such as Peng-Robinson (PR), Redlich-Kwong (RK), Soave-Redlich-Kwong (SRK), and Zudkevitch-Joffe (ZJ). Besides, trend analysis showed that AdaBoost-SVR model could match actual trends of H 2 solubility change versus temperature and pressure. Finally, outlier detection analysis using the Leverage technique indicated that the majority of data points used for modeling (nearly 94 %) are reliable and placed in the valid zone.
energy & fuels,electrochemistry,chemistry, physical
What problem does this paper attempt to address?