摘要
为降低建立模型的复杂度,提高近红外光谱模型的预测精度,提出一种两阶段相关系数(TSCC)波长选择方法。先计算各波长点向量和浓度向量之间的相关系数,选择相关系数较大的波长点作为第一阶段波长。在此基础上计算各个波长点之间的相关系数,选择与其他波长点之间相关系数较小的波长作为建模波长。采用2组公开数据集对算法进行验证,在第一阶段波长选择结果和两阶段波长选择结果上分别建立偏最小二乘回归(PLSR)模型和多元线性回归(MLR)模型。结果表明,基于TSCC波长选择算法建立的MLR模型(TSCC-MLR),性能优于基于全谱的PLSR模型(Full-PLSR)、优于基于第一阶段波长选择结果建立的PLSR模型(CC-PLSR)和基于连续投影算法(SPA)选择的数据建立的MLR模型(SPAMLR)。玉米数据SPA-MLR模型决定系数R2为0.8353,CC-PLSR模型R2为0.8652,TSCC-MLR模型R2为0.8951。土壤样本的近红外光谱数据集,基于TSCC算法提取变量后,得到18个特征波长并建立MLR模型,R2达到0.9688,相较于CC-PLSR和SPA-MLR模型,模型预测性能有所提升。通过2个数据集得到的结果,证明了本文所提出的TSCC波长选择方法是一种有效的变量选择方法。
To simplify model construction and increase the precision of near-infrared spectrum(NIRS) prediction models,a two-stage correlation coefficient(TSCC) wavelength selection method was proposed.The correlation coefficient between each wavelength vector and the concentration vector was first calculated,and the wavelength with the larger correlation coefficient was chosen as the result of the first stage wavelength selection.Based on this,the correlation coefficient between each wavelength was calculated,and the modeling wavelength was chosen based on the wavelength with the lowest correlation coefficient to the other wavelengths.The algorithm was validated using two publicly accessible datasets,and models for partial least squares regression(PLSR) and multiple linear regression(MLR) were created on the outcomes of the first stage wavelength selection and the second stage wavelength selection,respectively.According to the findings,the PLSR model based on the outcomes of the first stage wavelength selection(CC-PLSR),the PLSR model based on full spectra(Full-PLSR),and the MLR model based on data selected by the successive projections algorithm(SPA)(SPA-MLR) were all underperformed by the TSCC-MLR,which was the MLR model based on the wavelength selected by TSCC.Based on the corn dataset,the coefficient of determination(R2) of SPA-MLR,CC-PLSR and TSCC-MLR model were 0.8353,0.8652 and 0.8951,respectively.The NIR spectra of soil samples were extracted by TSCC technology to acquire 18 distinctive wavelengths.The MLR model was developed based on these wavelengths,Rp2of 0.9688.Compared with the CC-PLSR model and SPA-MLR model,the prediction performance of MLR model was enhanced.The results of two datasets showed that the two-stage correlation coefficient wavelength selection method proposed in this paper was an effective variable selection method.
作者
万岩
陈争光
焦峰
WAN Yan;CHEN Zhengguang;JIAO Feng(College of Electrical and Information,Heilongjiang Bayi Agricultural University,Daqing 163319,China;Agriculture College,Heilongjiang Bayi Agricultural University,Daqing 163319,China)
出处
《分析试验室》
EI
CAS
CSCD
北大核心
2023年第10期1332-1340,共9页
Chinese Journal of Analysis Laboratory
基金
国家自然科学基金(41977202)资助。
关键词
近红外光谱
特征波长选择
相关系数
多元线性回归
near infrared spectroscopy
characteristic wavelength selection
correlation coefficient
multiple lin-ear regression