With the development of mid-infrared (MIR) photoelectric devices, mid-infrared spectroscopy has become one of the important methods for non-invasive detection of blood glucose. The mid-infrared region (4000 - 400 cm&l...With the development of mid-infrared (MIR) photoelectric devices, mid-infrared spectroscopy has become one of the important methods for non-invasive detection of blood glucose. The mid-infrared region (4000 - 400 cm<sup>-1</sup>) has the well-known fingerprint region (1200 - 800 cm<sup>-1</sup>) of glucose, which has clearer characteristic absorption peaks and better specificity. There is a lot of molecular information about glucose in the MIR. The non-invasive detection of blood glucose by mid-infrared spectroscopy needs to achieve certain accuracy, and the quantitative model is an important factor affecting the accuracy of glucose detection. In this paper, the samples of imitation solution containing only glucose and the samples of imitation mixed solution are taken as the research objects, and the mid-infrared spectral data of the samples are collected. The full spectrum partial least squares Regression (PLSR) model, SNV + Ctr-PLSR model, MSC + Ctr-PLSR model, and convolutional neural networks (CNN) model of 3000 - 900 cm<sup>-1</sup> band were constructed. Full spectrum PLS model and CNN model of 1200 - 900 cm<sup>-1</sup> band were constructed. The experimental results show that the optimal model of the two bands is CNN, then the correlation coefficient of prediction set (Rp) of 3000 - 900 cm<sup>-1</sup> band is 0.95, and the root mean square error of pre-diction set (RMSEP) value is 22.10. The Rp of 1200 - 900 cm<sup>-1</sup> band is 0.95, and the RMSEP value is 22.54. The research results show that CNN is a promising method, which has higher accuracy than PLSR, and is especially suitable for modeling human complex environment. In addition, the study provides a theoretical and practical basis for CNN in feature selection and model interpretation.展开更多
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a...Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.展开更多
文摘With the development of mid-infrared (MIR) photoelectric devices, mid-infrared spectroscopy has become one of the important methods for non-invasive detection of blood glucose. The mid-infrared region (4000 - 400 cm<sup>-1</sup>) has the well-known fingerprint region (1200 - 800 cm<sup>-1</sup>) of glucose, which has clearer characteristic absorption peaks and better specificity. There is a lot of molecular information about glucose in the MIR. The non-invasive detection of blood glucose by mid-infrared spectroscopy needs to achieve certain accuracy, and the quantitative model is an important factor affecting the accuracy of glucose detection. In this paper, the samples of imitation solution containing only glucose and the samples of imitation mixed solution are taken as the research objects, and the mid-infrared spectral data of the samples are collected. The full spectrum partial least squares Regression (PLSR) model, SNV + Ctr-PLSR model, MSC + Ctr-PLSR model, and convolutional neural networks (CNN) model of 3000 - 900 cm<sup>-1</sup> band were constructed. Full spectrum PLS model and CNN model of 1200 - 900 cm<sup>-1</sup> band were constructed. The experimental results show that the optimal model of the two bands is CNN, then the correlation coefficient of prediction set (Rp) of 3000 - 900 cm<sup>-1</sup> band is 0.95, and the root mean square error of pre-diction set (RMSEP) value is 22.10. The Rp of 1200 - 900 cm<sup>-1</sup> band is 0.95, and the RMSEP value is 22.54. The research results show that CNN is a promising method, which has higher accuracy than PLSR, and is especially suitable for modeling human complex environment. In addition, the study provides a theoretical and practical basis for CNN in feature selection and model interpretation.
基金financial supports from National Natural Science Foundation of China(No.62205172)Huaneng Group Science and Technology Research Project(No.HNKJ22-H105)Tsinghua University Initiative Scientific Research Program and the International Joint Mission on Climate Change and Carbon Neutrality。
文摘Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.