Using simple average ensemble modelling to the quantitative analysis of 4th analytes in cut tobacco samples

Kong Haohui,Bai Wenliang,Li Hongru,Gan Feng
DOI: https://doi.org/10.3969/j.issn.1001-4160.2010.06.026
2010-01-01
Abstract:Reducing sugar,total sugar,total nitrogen and nicotine are often important indicators to measure the quality of tobacco leaf.Tobacco enterprises need a large number of tobacco samples to have a rapid quantitative analysis for those indicators after the leaves were cut in order to ensure that the quality of tobaccos are stable during their production.Generally,partial least-squares method can be used to establish a single mathematical model of the near-infrared spectroscopy and their chemical values of tobaccos. This can easily meet the need of our demand.However,this approach only gives a single prediction result for a sample and it's not possible to estimate the reliability of the predicted results of unknown samples.The idea of ensemble modeling is based on the maximal information usage between the samples.This approach creates multiple locally optimal models,which can give different predictions of different submodels for a sample.Those predictions can reflect the performance of various submodels.If the the final prediction result is taken as the average of those predictions,then the simple average ensemble method is created.This method thinks that each submodel is with the same importance and is a piece of importance information during the modelling process.Thus,it avoids the risk of introducing uncertainty by selecting a single model.In this paper,the simple average ensemble method was used for modeling,focusing on building the best training subsets and selecting the optimal number of submodels.And this approach was also compared with the ordinary partial least-squares modeling.The results show that the prediction results of simple average ensemble method was the same with ordinary PLS.Further more,the simple average ensemble method can offer standard deviations of calculated concentrations from different submodels.It could be a clue to evaluate the reliability of the calculated concentrations.
What problem does this paper attempt to address?