Application of Stacking Ensemble Learning Model in Quantitative Analysis of Biomaterial Activity

Hao Cao,Youlin Gu,Jiajie Fang,Yihua Hu,Wanying Ding,Haihao He,Guolong Chen
DOI: https://doi.org/10.1016/j.microc.2022.108075
IF: 5.304
2022-01-01
Microchemical Journal
Abstract:Quantitative analysis techniques based on attenuated total reflection Fourier transform infrared spectroscopy (ATR FT-IR) are widely used for component detection of cells rather than activity levels. In this study, a rapid nondestructive detection method for the activity of biomaterials is proposed. The method is based on the infrared spectroscopy technique, which analyzes the infrared absorption peaks of three different biomaterials before and after inactivation, and then obtains the changes of their surface functional groups after inactivation. According to the regular difference of their absorption spectra, the stacked ensemble learning model is used to accurately detect the activity ratio of the biomaterials. In the two-level fusion framework of the ensemble model, partial least squares regression (PLSR), gradient boosted decision tree (GBDT), random forest (RF) and extra tree (ET) are used as primary learners, linear regression is used as secondary learner. Duplicates and interfering data in the raw spectral can be eliminated by multiplicative scatter correction (MSC) and principal component analysis (PCA). The coefficient of determination of prediction set (R2p) for three biomaterials were 0.9641, 0.9946 and 0.9939, respectively. The root mean square error of prediction (RMSEP) were 5.7%, 2.1% and 2.3%, respectively. Compared with these single algorithm models, the stacking ensemble learning model has the highest and lowest values for R2p and RMSEP, respectively. The results reveal that the fusion model could enhance the generalization ability and prediction accuracy for detecting activity of biomaterial with the influence of various factors. This study not only provides a new method for detection of biomaterial activity, but also guidance for fusion algo-rithms as well.
What problem does this paper attempt to address?