摘要
变量选择经常被用于优化近红外光谱线性校正模型,消除冗余信息,提升回归的准确性和可解释性。该文研究并设计了一种基于蒙特卡洛的方法,用于评估不同线性校正方法在变量选择的子空间中能达到的最优程度,寻找变量选择对线性校正模型的优化极限。该方法通过获得验证指标——预测均方根误差(RMSEP)的分布图,揭示变量选择方法在数据集上的优化效果与优化极限。将该方法应用于3组样品的近红外光谱建模研究,结果表明:在烟草-果胶数据集上的可优化率约为24.98%,RMSEP降低了15.2%;在小麦-蛋白质数据集上的可优化率约为13.90%,RMSEP降低了9.5%;在玉米-淀粉数据集上的可优化率约为14.05%,RMSEP降低了57.1%。应用该方法可以快速得到变量选择方法在模型上的优化极限,为变量选择方法的设计、应用和评估提供参考。
Variable selection is often used to optimize linear calibration models for NIR spectrosco⁃py,eliminating redundant information and improving the accuracy and interpretability of regression.In this paper,a Monte Carlo-based method is studied and designed to find the optimal limit of vari⁃able selection for linear calibration models,and can evaluate the optimal degree that different linear calibration methods(such as partial least squares regression PLSR)can achieve in the subspace of variable selection.In this study,the distribution plot of the root mean square error prediction(RM⁃SEP)of the validation metric is obtained,and the optimization effect and optimization limit of the variable selection methods on the dataset are revealed.The method was applied to the near-infrared spectroscopy modeling of three datasets,and the results showed that the optimizable rate on the to⁃bacco-pectin dataset was about 24.98%,and the prediction error RMSEP was reduced by 15.2%.The optimizable rate on the wheat-protein dataset is about 13.90%,and the prediction error RMSEP is reduced by 9.5%.The optimizable on the corn-starch dataset is about 14.05%,and the prediction error RMSEP is reduced by 57.1%.The application of this method can quickly obtain the optimiza⁃tion limit of the variable selection methods on the model,and provide reference for the design,appli⁃cation and evaluation of the variable selection methods.
作者
潘正豪
王鹏
陈昆燕
李秋潼
唐杰
杨俊
邵利民
PAN Zheng-hao;WANG Peng;CHEN Kun-yan;LI Qiu-tong;TANG Jie;YANG Jun;SHAO Li-min(School of Chemistry and Material Science,University of Science and Technology of China,Hefei 230026,China;Technical Center,China Tobacco Chongqing Industrial Co.,Ltd.,Chongqing 400060,China)
出处
《分析测试学报》
CAS
CSCD
北大核心
2023年第12期1659-1665,共7页
Journal of Instrumental Analysis
关键词
化学计量学
近红外光谱
化学校正
线性模型
变量选择
蒙特卡洛方法
chemometrics
near-infrared spectroscopy
chemical calibration
linear model
variable selection
Monte Carlo method