As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring sys...As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.展开更多
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a...Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.展开更多
Accurately approximating higher order derivatives is an inherently difficult problem. It is shown that a random variable shape parameter strategy can improve the accuracy of approximating higher order derivatives with...Accurately approximating higher order derivatives is an inherently difficult problem. It is shown that a random variable shape parameter strategy can improve the accuracy of approximating higher order derivatives with Radial Basis Function methods. The method is used to solve fourth order boundary value problems. The use and location of ghost points are examined in order to enforce the extra boundary conditions that are necessary to make a fourth-order problem well posed. The use of ghost points versus solving an overdetermined linear system via least squares is studied. For a general fourth-order boundary value problem, the recommended approach is to either use one of two novel sets of ghost centers introduced here or else to use a least squares approach. When using either ghost centers or least squares, the random variable shape parameter strategy results in significantly better accuracy than when a constant shape parameter is used.展开更多
基金supported by National Key Scientific Instrument and Equipment Development Project of China,Grant Nos.2013YQ220643the National 863 Program of China,Grant Nos.2014AA06A503.
文摘As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.
基金financial supports from National Natural Science Foundation of China(No.62205172)Huaneng Group Science and Technology Research Project(No.HNKJ22-H105)Tsinghua University Initiative Scientific Research Program and the International Joint Mission on Climate Change and Carbon Neutrality。
文摘Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.
文摘Accurately approximating higher order derivatives is an inherently difficult problem. It is shown that a random variable shape parameter strategy can improve the accuracy of approximating higher order derivatives with Radial Basis Function methods. The method is used to solve fourth order boundary value problems. The use and location of ghost points are examined in order to enforce the extra boundary conditions that are necessary to make a fourth-order problem well posed. The use of ghost points versus solving an overdetermined linear system via least squares is studied. For a general fourth-order boundary value problem, the recommended approach is to either use one of two novel sets of ghost centers introduced here or else to use a least squares approach. When using either ghost centers or least squares, the random variable shape parameter strategy results in significantly better accuracy than when a constant shape parameter is used.