摘要
近红外光谱(NIR)具有快速、无损、操作方便的特点,故广泛用于食品分析。作为一种间接的分析技术,NIR需要建立光谱与待测浓度之间的统计模型来实现检测。故模型的维护有助于保证NIR的预测准确性。在外界条件发生变化的情况下,诸如样品性状的改变、仪器对理化指标函数关系的变化、湿度和温度等环境因素的改变,会导致相同样品的光谱信号发生偏移,进而使得原有模型的预测精度下降。此时,如果重新建模,虽然可以解决光谱偏移对建模的影响,但是重新建模将耗费大量的人力物力。对此,模型转移可以在避免重新建模的情况下,校正光谱的偏移,进而提高模型预测精度。通常模型转移算法多用全光谱进行模型转移,这种方法计算量较大,且不能找到合适的有化学意义的波段。故提出一种基于模型转移中的变量选择方法:向后迭代区间选择法(IIBS),通过计算主光谱(用于建模的那组光谱)和从光谱(发生偏移,需要通过模型转移算法将其校正的光谱)中,变量区间的重要性信息(回归系数(β)、残差向量(Res)以及变量重要性投影(VIP))。进而通过计算该区间变量重要性信息的几何平均数,并以此作为该区间的区间重要性指标。接着根据区间的重要性,删除重要性信息较小的变量区间。然后对主光谱和从光谱重复迭代上述过程:计算变量的重要性信息,计算区间的重要性信息,删除重要性信息较小的区间。最后,比较不同的主光谱和从光谱区间组合的验证均方根误差(RMSEV),选择RMSEV最小的主光谱和从光谱区间作为最优区间。玉米、小麦两套NIR数据测试了该算法。结果显示,与全波段相比,β,Res以及VIP均可以从主光谱和从光谱中选择较少的,有化学意义的区间,提高模型转移的精度。在比较不同变量重要性向量方面,基于β的变量选择算法,模型转移的计算误差较小。
The near-infrared spectra(NIR)with advantages of fastness,non-destructiveness and easy operation have been widely used in food analysis.As an indirect analysis method,NIR should calibrate the model between spectra and concentrations for analysis.Thus,the maintenance of the model can ensure high accuracy.The changes of external conditions,including the changes of samples characters,the variations of functions between physical and chemical indicators and the changes of the environment such as humidity and temperature,can diverge the spectra of the same samples and then decrease the prediction accuracy of the original model.To solve this problem,recalibration can eliminate the chances of spectra butcost huge laborious and economic expense.Thus,calibration transfer can correct the spectral divergence and improve model prediction accuracy without the expense of recalibration.In previous work,the calibration transfer algorithms usually use full spectra variables to transfer,which increase computation burden and not find spectra intervals with chemical information.Thus,this paper proposed a variable selection method called iterative interval backward selection(IIBS)for calibration transfer.IIBS firstly calculates the importance vectors of variable intervals in spectra,including regression coefficients(β),residual errors(Res)and VIP(VIP)vectors.Then set the geometric mean of the important values of variables in each interval as the corresponding interval’s importance.Moreover,based on the importance values of intervals,remove the smallest one.After that,repeat the above procedure iteratively for both primary and secondary spectra,including computing the importance and values of variables and intervals and remove the intervals with minimal importance value.Finally,compute the root mean squared error of validation(RMSEV)for each interval subsets combination of both primary and secondary spectra and choose the intervals combination with minimal RMSEV as the best one.Two datasets,including corn and wheat datasets,were executed to test this algorithm.The results show that compared with the spectra with full intervals,theβ,Res and VIP can select fewer but more important variable intervals from whole spectra to improve the calibration transfer accuracy.In contrast with different variable importance vectors,theβcan select variables intervals with low prediction errors.
作者
郑开逸
冯雨航
张文
黄晓玮
李志华
张迪
石吉勇
邹小波
ZHENG Kai-yi;FENG Yu-hang;ZHANG Wen;HUANG Xiao-wei;LI Zhi-hua;ZHANG Di;SHI Ji-yong;ZOU Xiao-bo(School of Food and Biological Engineering,Jiangsu University,Zhenjiang 212013,China)
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2021年第6期1789-1794,共6页
Spectroscopy and Spectral Analysis
基金
国家重点研发计划项目(2017YFD0400102)
国家自然科学基金项目(31972153)
国家博士后项目(2019M661758)
江苏省博士后项目(2019K014)
江苏大学基金项目(19JDG010)资助。
关键词
近红外光谱
模型转移
变量选择
回归系数
残差向量
VIP值
Near infrared spectra
Calibration transfer
Variable selection
Regression coefficient
Residual error
VIP