Developing high-dimensional machine learning models to improve generalization ability and overcome data insufficiency for mixed sugar fermentation simulation
Xiao-Yan Huang,Tian-Jie Ao,Xue Zhang,Kai Li,Xin-Qing Zhao,Verawat Champreda,Weerawat Runguphan,Chularat Sakdaronnarong,Chen-Guang Liu,Feng-Wu Bai
DOI: https://doi.org/10.1016/j.biortech.2023.129375
IF: 11.4
2023-06-27
Bioresource Technology
Abstract:Biorefinery can be promoted by building accurate machine learning models. This work proposed a strategy to enhance model's generalization ability and overcome insufficient data conditions for mixed sugar fermentation simulation. Multiple inputs single output models, using initial glucose, initial xylose, and time together as inputs, have higher generalization ability than single input single output models with time as sole input in predicting glucose, xylose, ethanol, or biomass separately. Multiple inputs multiple outputs models, integrating outputs, enhanced model accuracy and resulted in an average R 2 at 0.99. To overcome data insufficiency conditions, consensus yeast (CY) model, through consolidating data from 4 yeasts, obtained R 2 at 0.90. By adjusting the pretrained CY model, the model can save more than 50% data and get R 2 at 0.95 and 0.93 for yeast and bacterial fermentation simulation. The strategy can expand the application range and save costs of data curation for ANN models.
energy & fuels,biotechnology & applied microbiology,agricultural engineering