摘要
作为二次分析方法,近红外光谱分析的重现性和可靠性非常依赖于建模过程。以近红外光谱小麦蛋白质定量分析模型为例,研究了多变量定标建模过程中异常样本问题,旨在讨论复杂样本建模中的样本对模型的影响和作用。以PLSR算法建模中校正方差与验证方差的解释百分比曲线的背离特性作为异常样本存在的判据,当两个百分比曲线显著偏离时,则认为样本集中存在异常样本,并对建模产生了显著影响。异常样本的识别和处理,以及影响分析是本文主要的创新性工作,采用了基于样本删除的子模型遍历统计方法,能够渐次识别并提取出异常样本。在剔除异常样本后的模型预测结果中,以模型的预测残差标准差作为参考距离对异常样本进行了离群程度分级,可分为显著离群样本,相对离群样本以及潜在离群样本,数据集中显著离群样本约占7.8%,相对离群样本约占15.6%。异常样本对模型的影响表现在对正常样本的预测残差上,使预测值偏离理想拟合直线,分散性增加。剔除异常样本或以样本权重建模可有效抑制异常样本的影响,使模型的解释性更偏向于多数样本数据,降低模型的经验风险误差。
As a secondary analysis method,reproducibility and reliability of near-infrared spectroscopy(NIRS)quantitative analysis are quite dependent on modelling process.In this paper,it is focused on outlier analysis for protein quantitative model of wheat based on NIRS.The purpose is to discuss the outlier effect in modelling process of complex sample set.The indicator of outliers is the deviation between two interpretative percentage curves in partial least squares regression(PLSR)modelling,when two percentage curves have significant deviation or departure point,the sample set should include the outliers.The innovative research work is the analysis and treatment of outliers.On the basis of sub-model ergodic calculation method,outliers can be gradually identified and picked-up.The standard deviation of model's prediction residual is used as the reference graduation to distinguish the degree of deviation.According to the degree of deviation from sample population,outliers can also be divided into significant outliers,relative outliers and potential outliers.In this paper,the significant outliers of the sample set are about 7.8%,and the relative outliers are about 15.6%.The outliers will pull normal samples apart from the ideal fitting line and make the dispersity increase.No matter modelling with removed outliers or weighted samples,the purpose is to make the fitting results of quantitative analysis modelling more inclined to majority samples,while reducing or eliminating the impact of outliers.
作者
郑峰
刘丽莹
刘小溪
李野
石晓光
张国玉
宦克为
ZHENG Feng LIU Li-ying LIU Xiao-xi LI Ye SHI Xiao-guang ZHANG Guo-yu HUAN Ke-wei(Changchun University of Science and Technology, Changchun 130022, China Institute of Scientific and Technical Information in Jilin Province, Changchun 130000, China)
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2016年第11期3523-3529,共7页
Spectroscopy and Spectral Analysis
基金
2014年度国家公益性行业(气象)科研专项课题(GYHY201406037)
2011年高等学校博士学科点专项科研基金联合资助项目(20112216110006)资助
关键词
近红外光谱
样本影响
灰色系统
子模型群集学习
Near infrared spectroscopy
Outlier analysis
Gray system
Sub-model population learning