Spectroscopy can be used for detecting crop characteristics. A goal of crop spectrum analysis is to extract effective features from spectral data for establishing a detection model. An ideal spectral feature set shoul...Spectroscopy can be used for detecting crop characteristics. A goal of crop spectrum analysis is to extract effective features from spectral data for establishing a detection model. An ideal spectral feature set should have high sensitivity to target parameters but low information redundancy among features.However, feature-selection methods that satisfy both requirements are lacking. To address this issue,in this study, a novel method, the continuous wavelet projections algorithm(CWPA), was developed,which has advantages of both continuous wavelet analysis(CWA) and the successive projections algorithm(SPA) for generating optimal spectral feature set for crop detection. Three datasets collected for crop stress detection and retrieval of biochemical properties were used to validate the CWPA under both classification and regression scenarios. The CWPA generated a feature set with fewer features yet achieving accuracy comparable to or even higher than those of CWA and SPA. With only two to three features identified by CWPA, an overall accuracy of 98% in classifying tea plant stresses was achieved, and high coefficients of determination were obtained in retrieving corn leaf chlorophyll content(R^(2)= 0.8521)and equivalent water thickness(R^(2)= 0.9508). The mechanism of the CWPA ensures that the novel algorithm discovers the most sensitive features while retaining complementarity among features. Its ability to reduce the data dimension suggests its potential for crop monitoring and phenotyping with hyperspectral data.展开更多
The identification of timber properties is important for safe application.Near Infrared Spectroscopy(NIRS)technology is widely-used because of its simplicity,efficiency,and positive environmental attributes.However,in...The identification of timber properties is important for safe application.Near Infrared Spectroscopy(NIRS)technology is widely-used because of its simplicity,efficiency,and positive environmental attributes.However,in its application,weak signals are extracted from complex,overlapping and changing information.This study focused on the stability of NIR modeling.The Orthogonal Partial Least Squares(OPLS)and Successive Projections Algorithm(SPA)eliminates noise and extracts effective spectra,and an ensemble learning method MIX-PLS,is applied to establish the model.The elastic modulus of timber is taken as an example,and 201 wood samples of three species,Xylosmacongesta(Lour.)Merr.,Acer pictum subsp.mono,and Betula pendula,samples were divided into three groups to investigate modelling performance.The results show that OPLS can preprocess the near-infrared spectroscopy information according to the target object in the face of the system error and reduce errors to minimum.SPA finally selects 13 spectral bands,simplifies the NIR spectral data and improves model accuracy.The Pearson's correlation coefficient of Calibration(Rc)and the Pearson's correlation coefficient of Prediction(Rp)of Mix Partial Least Squares(MIX-PLS)were 0.95 and 0.90,and Root Mean Square Error of Calibration(RMSEC)and Root Mean Square Error of Prediction(RMSEP)are 2.075 and 6.001,respectively,which shows the model has good generalization abilities.展开更多
Consensus methods have presented promising tools for improving the reliability of quantitative models in near-infrared(NIR) spectroscopic analysis.A strategy for improving the performance of consensus methods in multi...Consensus methods have presented promising tools for improving the reliability of quantitative models in near-infrared(NIR) spectroscopic analysis.A strategy for improving the performance of consensus methods in multivariate calibration of NIR spectra is proposed.In the approach,a subset of non-collinear variables is generated using successive projections algorithm(SPA) for each variable in the reduced spectra by uninformative variables elimination(UVE).Then sub-models are built using the variable subsets and the calibration subsets determined by Monte Carlo(MC) re-sampling,and the sub-model that produces minimal error in cross validation is selected as a member model.With repetition of the MC re-sampling,a series of member models are built and a consensus model is achieved by averaging all the member models.Since member models are built with the best variable subset and the randomly selected calibration subset,both the quality and the diversity of the member models are insured for the consensus model.Two NIR spectral datasets of tobacco lamina are used to investigate the proposed method.The superiority of the method in both accuracy and reliability is demonstrated.展开更多
This study was conducted to investigate the potential of hyperspectral imaging technique(900-1700 nm)for nondestructive determination of inosinic acid(IMP)in chicken.Hyperspectral images of chicken flesh samples were ...This study was conducted to investigate the potential of hyperspectral imaging technique(900-1700 nm)for nondestructive determination of inosinic acid(IMP)in chicken.Hyperspectral images of chicken flesh samples were acquired,and their mean spectra within the images were extracted.The quantitative relationship between the mean spectra and reference IMP value was fitted by partial least squares(PLS)regression algorithm.A PLS model(MAS-PLS)built with moving average smoothing(MAS)spectra showed better performance in predicting IMP content,leading to correlation coefficients(RP)of 0.951,root mean square error(RMSEP)of 0.046 mg/g,and residual predictive deviation(RPD)of 3.152.Regression coefficient(RC),successive projections algorithm(SPA),stepwise,competitive adaptive reweighted sampling(CARS),and uninformative variable elimination(UVE)were used to select the optimal wavelengths to optimize the MAS-PLS model.Based on the 18 optimal wavelengths(907.14,917.02,918.67,926.90,930.20,936.78,956.54,1004.28,1135.89,1211.56,1302.07,1367.94,1397.60,1488.31,1680.17,1683.49,1686.80 and 1695.10 nm)selected from MAS spectra by SPA,the MAS-SPA-PLS model was built with R_(P) of 0.920,RMSEP of 0.056 mg/g and RPD of 3.220,which was similar to the MAS-PLS model.The overall study indicated that hyperspectral imaging in the 900-1700 nm range combined with PLS and SPA could be used to predict the IMP content in chicken flesh.展开更多
In this work, two chemometrics methods are applied for the modeling and prediction of electrophoretic mobilities of some organic and inorganic compounds. The successive projection algorithm, feature selection (SPA) ...In this work, two chemometrics methods are applied for the modeling and prediction of electrophoretic mobilities of some organic and inorganic compounds. The successive projection algorithm, feature selection (SPA) strategy, is used as the descriptor selection and model development method. Then, the support vector machine (SVM) and multiple linear regression (MLR) model are utilized to construct the non-linear and linear quantitative structure-property relationship models. The results obtained using the SVM model are compared with those obtained using MLR reveal that the SVM model is of much better predictive value than the MLR one. The root-mean-square errors for the training set and the test set for the SVM model were 0.1911 and 0.2569, respectively, while by the MLR model, they were 0.4908 and 0.6494, respectively. The results show that the SVM model drastically enhances the ability of prediction in QSPR studies and is superior to the MLR model.展开更多
基金supported by the National Natural Science Foundation of China (42071420)the Major Special Project for 2025 Scientific,Technological Innovation (Major Scientific and Technological Task Project in Ningbo City)(2021Z048)the National Key Research and Development Program of China(2019YFE0125300)。
文摘Spectroscopy can be used for detecting crop characteristics. A goal of crop spectrum analysis is to extract effective features from spectral data for establishing a detection model. An ideal spectral feature set should have high sensitivity to target parameters but low information redundancy among features.However, feature-selection methods that satisfy both requirements are lacking. To address this issue,in this study, a novel method, the continuous wavelet projections algorithm(CWPA), was developed,which has advantages of both continuous wavelet analysis(CWA) and the successive projections algorithm(SPA) for generating optimal spectral feature set for crop detection. Three datasets collected for crop stress detection and retrieval of biochemical properties were used to validate the CWPA under both classification and regression scenarios. The CWPA generated a feature set with fewer features yet achieving accuracy comparable to or even higher than those of CWA and SPA. With only two to three features identified by CWPA, an overall accuracy of 98% in classifying tea plant stresses was achieved, and high coefficients of determination were obtained in retrieving corn leaf chlorophyll content(R^(2)= 0.8521)and equivalent water thickness(R^(2)= 0.9508). The mechanism of the CWPA ensures that the novel algorithm discovers the most sensitive features while retaining complementarity among features. Its ability to reduce the data dimension suggests its potential for crop monitoring and phenotyping with hyperspectral data.
基金supported financially by the China State Forestry Administration“948”projects(2015-4-52)Heilongjiang Natural Science Foundation(C2017005)。
文摘The identification of timber properties is important for safe application.Near Infrared Spectroscopy(NIRS)technology is widely-used because of its simplicity,efficiency,and positive environmental attributes.However,in its application,weak signals are extracted from complex,overlapping and changing information.This study focused on the stability of NIR modeling.The Orthogonal Partial Least Squares(OPLS)and Successive Projections Algorithm(SPA)eliminates noise and extracts effective spectra,and an ensemble learning method MIX-PLS,is applied to establish the model.The elastic modulus of timber is taken as an example,and 201 wood samples of three species,Xylosmacongesta(Lour.)Merr.,Acer pictum subsp.mono,and Betula pendula,samples were divided into three groups to investigate modelling performance.The results show that OPLS can preprocess the near-infrared spectroscopy information according to the target object in the face of the system error and reduce errors to minimum.SPA finally selects 13 spectral bands,simplifies the NIR spectral data and improves model accuracy.The Pearson's correlation coefficient of Calibration(Rc)and the Pearson's correlation coefficient of Prediction(Rp)of Mix Partial Least Squares(MIX-PLS)were 0.95 and 0.90,and Root Mean Square Error of Calibration(RMSEC)and Root Mean Square Error of Prediction(RMSEP)are 2.075 and 6.001,respectively,which shows the model has good generalization abilities.
文摘牛奶中的蛋白质含量会影响牛奶的品质,利用高光谱图像的光谱特征信息研究对牛奶蛋白质含量预测的可行性。本文提出一种基于竞争性自适应重加权算法(competitive adaptive reweighted sampling, CARS)和连续投影算法(successive projections algorithm, SPA)结合多层前馈神经网络(back propagation, BP)的预测建模方法,实验以含有不同浓度蛋白质的牛奶为对象,利用可见光/近红外高光谱成像系统共采集到5种牛奶共计250组高光谱数据,通过实验对比选择采用标准化方法对获取到的吸收光谱预处理,然后采用CARS结合SPA筛选特征波长,得到18个特征波长,建立CARS-SPA-BP模型,经过试验,CARS-SPA-BP模型的训练集决定系数和测试集决定系数R;和R;分别达到0.971和0.968,训练集均方根误差(root mean square error of calibration,RMSEC)和测试集均方根误差(root mean square error of prediction,RMSEP)达到了0.033和0.034。研究发现,采用CARS结合SPA筛选的牛奶特征波长建立的多层前馈神经网络模型,其模型预测结果与全波长建模相比并没有明显降低,因此将CARS结合SPA用于波长筛选并且结合BP神经网络基本可以完成对牛奶蛋白质含量的预测。为验证CARS-SPA-BP模型的预测能力,在相同数据环境下,使用较为传统的偏最小二乘回归(partial least squares regression, PLSR)进行建模,实验结果表明,CARS-SPA-BP相较于PLSR,R;和RMSEP均有明显提升。研究表明,CARS-SPA-BP可充分利用牛奶光谱特征信息实现较高精度的牛奶蛋白质含量检测。
基金supported by the National Natural Science Foundation of China (20835002)
文摘Consensus methods have presented promising tools for improving the reliability of quantitative models in near-infrared(NIR) spectroscopic analysis.A strategy for improving the performance of consensus methods in multivariate calibration of NIR spectra is proposed.In the approach,a subset of non-collinear variables is generated using successive projections algorithm(SPA) for each variable in the reduced spectra by uninformative variables elimination(UVE).Then sub-models are built using the variable subsets and the calibration subsets determined by Monte Carlo(MC) re-sampling,and the sub-model that produces minimal error in cross validation is selected as a member model.With repetition of the MC re-sampling,a series of member models are built and a consensus model is achieved by averaging all the member models.Since member models are built with the best variable subset and the randomly selected calibration subset,both the quality and the diversity of the member models are insured for the consensus model.Two NIR spectral datasets of tobacco lamina are used to investigate the proposed method.The superiority of the method in both accuracy and reliability is demonstrated.
基金the Major Scientific and Technological Project of Henan Province(Grant No.182102310060,161100110600)Key Scientific and Technological Project of Henan Province(Grant No.212102310491)+2 种基金China Postdoctoral Science Foundation(Grant 2018M632767)Henan Postdoctoral Science Foundation(Grant No.001801021)Youth Talents Lifting Project of Henan Province(Grant No.2018HYTP008).
文摘This study was conducted to investigate the potential of hyperspectral imaging technique(900-1700 nm)for nondestructive determination of inosinic acid(IMP)in chicken.Hyperspectral images of chicken flesh samples were acquired,and their mean spectra within the images were extracted.The quantitative relationship between the mean spectra and reference IMP value was fitted by partial least squares(PLS)regression algorithm.A PLS model(MAS-PLS)built with moving average smoothing(MAS)spectra showed better performance in predicting IMP content,leading to correlation coefficients(RP)of 0.951,root mean square error(RMSEP)of 0.046 mg/g,and residual predictive deviation(RPD)of 3.152.Regression coefficient(RC),successive projections algorithm(SPA),stepwise,competitive adaptive reweighted sampling(CARS),and uninformative variable elimination(UVE)were used to select the optimal wavelengths to optimize the MAS-PLS model.Based on the 18 optimal wavelengths(907.14,917.02,918.67,926.90,930.20,936.78,956.54,1004.28,1135.89,1211.56,1302.07,1367.94,1397.60,1488.31,1680.17,1683.49,1686.80 and 1695.10 nm)selected from MAS spectra by SPA,the MAS-SPA-PLS model was built with R_(P) of 0.920,RMSEP of 0.056 mg/g and RPD of 3.220,which was similar to the MAS-PLS model.The overall study indicated that hyperspectral imaging in the 900-1700 nm range combined with PLS and SPA could be used to predict the IMP content in chicken flesh.
文摘In this work, two chemometrics methods are applied for the modeling and prediction of electrophoretic mobilities of some organic and inorganic compounds. The successive projection algorithm, feature selection (SPA) strategy, is used as the descriptor selection and model development method. Then, the support vector machine (SVM) and multiple linear regression (MLR) model are utilized to construct the non-linear and linear quantitative structure-property relationship models. The results obtained using the SVM model are compared with those obtained using MLR reveal that the SVM model is of much better predictive value than the MLR one. The root-mean-square errors for the training set and the test set for the SVM model were 0.1911 and 0.2569, respectively, while by the MLR model, they were 0.4908 and 0.6494, respectively. The results show that the SVM model drastically enhances the ability of prediction in QSPR studies and is superior to the MLR model.