摘要
还原糖,总糖,总氮和烟碱通常是衡量烟叶品质的重要指标,烟草企业在完成烟叶切丝工艺时需要对大量的烟丝样品进行这4项指标的快速定量分析,以保证生产过程中卷烟质量的稳定。一般可以采用偏最小二乘法对烟丝的近红外光谱及其化学值建立一个单一的数学模型来满足这种需求。但这种做法只能给出样品的单一预测结果,无法估计未知样品预测结果的可靠性。而组合建模的思想则是充分利用了样品之间的信息,建立多个优化的局部模型,从而可以给出同一个样品的不同子模型的预测结果,这些结果则反应了各个子模型的预测性能。如果将这些结果的平均值作为最后的预测结果,就是简单平均组合法。它将每个子模型都同等待,认为每个子模型都是建模过程中的一个重要的信息,从而避免了选择单一模型所带来的不确定性。本文采用简单平均组合法进行建模,重点探讨了构建最佳训练子集的方法以及子模型的数量的选择方法并跟普通偏最小二乘法建模进行对比。结果表明简单平均组合方法的预测效果与普通偏最小二乘法相同。进一步的研究表明,通过组合建模的方式计算出的每个样本针对每个模型的标准偏差,有可能为评估预测结果的可靠性提供信息。
Reducing sugar,total sugar,total nitrogen and nicotine are often important indicators to measure the quality of tobacco leaf.Tobacco enterprises need a large number of tobacco samples to have a rapid quantitative analysis for those indicators after the leaves were cut in order to ensure that the quality of tobaccos are stable during their production.Generally,partial least-squares method can be used to establish a single mathematical model of the near-infrared spectroscopy and their chemical values of tobaccos. This can easily meet the need of our demand.However,this approach only gives a single prediction result for a sample and it's not possible to estimate the reliability of the predicted results of unknown samples.The idea of ensemble modeling is based on the maximal information usage between the samples.This approach creates multiple locally optimal models,which can give different predictions of different submodels for a sample.Those predictions can reflect the performance of various submodels.If the the final prediction result is taken as the average of those predictions,then the simple average ensemble method is created.This method thinks that each submodel is with the same importance and is a piece of importance information during the modelling process.Thus,it avoids the risk of introducing uncertainty by selecting a single model.In this paper,the simple average ensemble method was used for modeling,focusing on building the best training subsets and selecting the optimal number of submodels.And this approach was also compared with the ordinary partial least-squares modeling.The results show that the prediction results of simple average ensemble method was the same with ordinary PLS.Further more,the simple average ensemble method can offer standard deviations of calculated concentrations from different submodels.It could be a clue to evaluate the reliability of the calculated concentrations.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2010年第6期830-834,共5页
Computers and Applied Chemistry
基金
中国烟草广东工业有限公司(No.I05XM-QK[2008]017)
国家自然科学基金(No.20875106)
广东省自然科学基金委员会(No.9151027501000003)
关键词
多元校正
组合建模
简单平均法
偏最小二乘法
烟草
multivariate calibration
ensemble modelling
simple average method
partial least-squares
tobacco