Comparison of Machine Learning Methods for Predicting the Methane Production from Anaerobic Digestion of Lignocellulosic Biomass

Zhengxin Wang,Xinggan Peng,Ao Xia,Akeel A. Shah,Huchao Yan,Yun Huang,Xianqing Zhu,Xun Zhu,Qiang Liao
DOI: https://doi.org/10.1016/j.energy.2022.125883
IF: 9
2022-01-01
Energy
Abstract:Biogas derived from the anaerobic digestion of biomass can provide a carbon-neutral resource for green energy supply in the future. The biochemical methane potential (BMP) test has been widely applied to assess the characteristics of methane production from anaerobic digestion in batch mode. However, the determination of key parameters in the BMP test, such as specific methane yield (SMY), usually requires long-term experiments, especially for lignocellulosic feedstocks with slow degradation rates. This study aims to propose an appropriate data-driven model for the efficient prediction of the SMY using data from 277 samples of various lignocellulosic biomass materials by evaluating ten different machine learning (ML) methods. The Pearson coefficient matrix indicates that the chemical components are more relevant as attributes for the ML models, compared to element compositions, and the content of lignin has a strong linear correlation with SMY. Classic nonlinear ML methods (R2 >= 0.61) perform better than linear methods (R2 <= 0.56), and an ensemble learning model (R2 = 0.71) is better than a single learner (R2 <= 0.67). A K-nearest neighbor (KNN) model using leave-one-out cross-validation (LOOCV) obtains the best performance (R2 = 0.75, MAE = 30.2 mL/gVS). The generalization performance of the best model is found to have an average relative error of 10.05%.
What problem does this paper attempt to address?