摘要
针对物性参数和近红外光谱数据之间的回归模型的建立问题,基于建立一系列回归器的思想,给出了1种用于多变量校正的Boosting-PLS算法。每个(弱/基本)回归器均建立于原校正集的1个子集上,每个子集均通过原校正集带概率重复采样的方式得到,而样本的概率则由前1个回归器的预测误差确定。大误差的样本将增大概率,以便后续的回归器更集中地对其进行训练。最终的集成回归模型则为弱回归器的加权取中值。通过1个近红外应用实例和与偏最小二乘的比较,证实了Boosting-PLS算法的优良性能,所建校正模型更精确、更稳健,对过拟合不敏感。
For modeling the relationship between physical/chemical parameter and near-infrared spectroscopic data, a boosting-PLS algorithm is provided for multivariate calibration. This algorithm is based on the concept of building a series of base/weak repressors, each of which is trained on different subsets of a calibration set. Each subset is generated by the way that samples in the training set are picked out with the probability which is obtained by the previous repressor. If the prediction of a specific sample with the previous repressor is poor, its probability is increased to be trained intensively later. Final prediction is made by weighted median of all weak repressors. By an experiment related to near-infrared spectroscopy and comparison with PLS, it seems that the proposed boosting-PLS can produce a more accurate and more robust calibration model, which is less sensitive to overfiting.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2010年第2期241-244,共4页
Computers and Applied Chemistry
基金
四川省青年科技基金(09ZQ026-066)
宜宾学院博士科研启动基金(2008B06)