Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi...Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for t...The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.展开更多
Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classi...Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classification rate of kernel partial least squares-discriminant analysis was 100% for training set, and 100% for test set, with the lowest concentration detected malathion residues in water being 1 μg·ml-1. Kernel partial least squares-discriminant analysis was able to have a good performance in classifying data in nonlinear systems. It was inferred that Near-infrared spectroscopy coupled with the kernel partial least squares-discriminant analysis had a potential in rapid screening other pesticide residues in water.展开更多
Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more acc...Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.展开更多
Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance up...Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a...Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.展开更多
As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring sys...As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.展开更多
The Laser Induced Breakdown Spectroscopy (LIBS) is a fast, non-contact, no sample preparation analytic technology;it is very suitable for on-line analysis of alloy composition. In the copper smelting industry, analysi...The Laser Induced Breakdown Spectroscopy (LIBS) is a fast, non-contact, no sample preparation analytic technology;it is very suitable for on-line analysis of alloy composition. In the copper smelting industry, analysis and control of the copper alloy concentration affect the quality of the products greatly, so LIBS is an efficient quantitative analysis tech- nology in the copper smelting industry. But for the lead brass, the components of Pb, Al and Ni elements are very low and the atomic emission lines are easily submerged under copper complex characteristic spectral lines because of the matrix effects. So it is difficult to get the online quantitative result of these important elements. In this paper, both the partial least squares (PLS) method and the calibration curve (CC) method are used to quantitatively analyze the laser induced breakdown spectroscopy data which is obtained from the standard lead brass alloy samples. Both the major and trace elements were quantitatively analyzed. By comparing the two results of the different calibration method, some useful results were obtained: both for major and trace elements, the PLS method was better than the CC method in quantitative analysis. And the regression coefficient of PLS method is compared with the original spectral data with background interference to explain the advantage of the PLS method in the LIBS quantitative analysis. Results proved that the PLS method used in laser induced breakdown spectroscopy was suitable for simultaneous quantitative analysis of different content elements in copper smelting industry.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
Rapid and sensitive recognition of herbal pieces according to different concocted processing is crucial to quality control and pharmaceutical effect. Near-infrared (NIR) and mid-infrared (MIR) technology combined ...Rapid and sensitive recognition of herbal pieces according to different concocted processing is crucial to quality control and pharmaceutical effect. Near-infrared (NIR) and mid-infrared (MIR) technology combined with supervised pattern recognition based on partial least-squares discriminant analysis (PLSDA) was attempted to classify and recognize six different concocted processing pieces of 600 Areca catechu L. samples and the influence of fingerprint information preprocessing methods on recognition performance was also investigated in this work. Recognition rates of 99.24%, 100% and 99.49% for original fingerprint, multiple scatter correct (MSC) fingerprint and second derivative (2nd derivative) fingerprint of NIR spectra were achieved by PLSDA models, respectively. Meanwhile, a perfect recognition rate of 100% was obtained for the above three fingerprint models of MIR spectra. In conclusion, PLSDA can rapidly and effectively extract otherness of fingerprint information from NIR and MIR spectra to identify different concocted herbal pieces ofA. catechu.展开更多
Partial least squares discriminant analysis (PLS-DA) with integrated moving-window (MW) waveband screening was applied to the discriminant analysis of liquor brands with near-infrared (NIR) spectroscopy. Luzhou Laojia...Partial least squares discriminant analysis (PLS-DA) with integrated moving-window (MW) waveband screening was applied to the discriminant analysis of liquor brands with near-infrared (NIR) spectroscopy. Luzhou Laojiao, a popular liquor with strong fragrant flavor, was used as the identified liquor brand (160 samples, negative, 52 vol alcoholicity). Liquors of 10 other brands with strong fragrant flavor were used as the interferential brands (200 samples, positive, 52 vol alcoholicity). The Kennard-Stone algorithm was used for the division of modeling samples to achieve uniformity and representativeness. Based on the MW-PLS-DA, a simplified optimal model set with 157 wavebands was further proposed. This set contained five types of wavebands corresponding to the NIR absorption bands of water, ethanol, and other micronutrients (i.e., acids, aldehydes, phenols, and aromatic compounds) in liquor for practical choice. Using five selected simple models with 4775 - 4239, 7804 - 6569, 6264 - 5844, 9435 - 7896, and 12066 - 10373 cm-1, the validation recognition rates were obtained as 99.3% or higher. Results show good prediction performance and low model complexity, and also provided a valuable reference for designing small dedicated instruments. The proposed method is a promising tool for large-scale inspection of liquor food safety.展开更多
High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent. Brand id...High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent. Brand identification of wine is difficult and complex because of high similarity. In this paper, visible and near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was used to explore the feasibility of wine brand identification. Chilean Aoyo wine (2016 vintage) was selected as the identification brand (negative, 100 samples), and various other brands of wine were used as interference brands (positive, 373 samples). Samples of each type were randomly divided into the calibration, prediction and validation sets. For comparison, the PLS-DA models were established in three independent and two complex wavebands of visible (400 - 780 nm), short-NIR (780 - 1100 nm), long-NIR (1100 - 2498 nm), whole NIR (780 - 2498 nm) and whole scanning (400 - 2498 nm). In independent validation, the five models all achieved good discriminant effects. Among them, the visible region model achieved the best effect. The recognition-accuracy rates in validation of negative, positive and total samples achieved 100%, 95.6% and 97.5%, respectively. The results indicated the feasibility of wine brand identification with Vis-NIR spectroscopy.展开更多
For quality control purpose, an approach of fingerprinting and simultaneous quantification of five major bioactive constituents of Rhizoma Coptidis was established via a high-performance liquid chromatograph coupled w...For quality control purpose, an approach of fingerprinting and simultaneous quantification of five major bioactive constituents of Rhizoma Coptidis was established via a high-performance liquid chromatograph coupled with a photodiode array UV detector(HPLC-DAD) and an electrospray ionization mass spectrometer(HPLC-ESI/MS) The compounds were identified on the basis of the comparison of their mass spectra with literature data and those of standard samples and quantified by the HPLC-DAD method. Baseline separation was achieved on an XTerra C18 column(5 μm, 250 mm×4.6 mm i. d.) with linear gradient elution of formate buffer(consisting of 0.5% formic acid, adjusted to pH=4.5 with ammonia) and acetonitrile(consisting of 0.2% formic acid and 0.2% triethylamine). The me- thod was validated for linearity(r^2〉0.9995), repeatability(RSD〈3.1%), intra- and inter-day precision(RSD〈1.8%) with recovery(99.9%-105.1%), limits of detection(0.15-0.35 μg/mL), and limits of quantification(0.53-0.82 μg/mL). The similarities of 32 batches of Rhizoma Coptidis and their classification according to their manufacturers were based on the retention time and peak areas of the characteristic compounds. The five compounds were selected for quality assessment ofRhizoma coptidis via partial least squares analysis(PLS).展开更多
As one of the important indicators of spectrometer,signal-to-noise ratio(SNR)reflects the ability of spectrometer to detect weak signals.To investigate the influence of SNR on the prediction accuracy of spectral analy...As one of the important indicators of spectrometer,signal-to-noise ratio(SNR)reflects the ability of spectrometer to detect weak signals.To investigate the influence of SNR on the prediction accuracy of spectral analysis,we first introduce the major factors affecting the spectral SNR.Taking green tea as an example,the influence of spectral SNR on the prediction accuracy of the origin identification model is analyzed by experiments.At the same time,the relationship between the spectral SNR and prediction accuracy of spectral analysis model is fitted.Based on this,the common methods for improving the spectral SNR are discussed.The results show that the accuracy of the prediction set model first decreases slowly,then decreases linearly,and finally tends to be flat as the spectral SNR decreases.Through calculation,in order to achieve the prediction accuracy of prediction model reaching 90%and 85%,the spectral SNR is required to be higher than 23.42 dB and 21.16 dB,respectively.The overall results provide certain parameters support for the development of new online analytical spectroscopic instruments,especially for the technical indicators of SNR.展开更多
The identification of soy sauce adulteration can avoid fraud, and protect the rights and interests of producers and consumers. Based on two measurement models (1 mm, 10 mm), the visible and near-infrared (Vis-NIR) spe...The identification of soy sauce adulteration can avoid fraud, and protect the rights and interests of producers and consumers. Based on two measurement models (1 mm, 10 mm), the visible and near-infrared (Vis-NIR) spectroscopy combined with standard normal variate-partial least squares-discriminant analysis (SNV-PLS-DA) was used to establish the discriminant analysis models for adulterated and brewed soy sauces. Chubang soy sauce was selected as an identification brand (negative, 70). The adulteration samples (positive, 72) were prepared by mixing Chubang soy sauce and blended soy sauce with different adulteration rates. Among them, the “blended soy sauce” sample was concocted of salt water (NaCl), monosodium glutamate (C<sub>5</sub>H<sub>10</sub>NNaO<sub>5</sub>) and caramel color (C<sub>6</sub>H<sub>8</sub>O<sub>3</sub>). The rigorous calibration-prediction-validation sample design was adopted. For the case of 1 mm, five waveband models (visible, short-NIR, long-NIR, whole NIR and whole scanning regions) were established respectively;in the case of 10 mm, three waveband models (visible, short-NIR and visible-short-NIR regions) for unsaturated absorption were also established respectively. In independent validation, the models of all wavebands in the cases of 1 mm and 10 mm have achieved good discrimination effects. For the case of 1 mm, the visible model achieved the optimal validation effect, the validation recognition-accuracy rate (RAR<sub>V</sub>) was 99.6%;while in the case of 10 mm, both the visible and visible-short-NIR models achieved the optimal validation effect (RAR<sub>V</sub> = 100%). The detection method does not require reagents and is fast and simple, which is easy to promote the application. The results can provide valuable reference for designing small dedicated spectrometers with different measurement modals and different spectral regions.展开更多
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
文摘The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.
文摘Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classification rate of kernel partial least squares-discriminant analysis was 100% for training set, and 100% for test set, with the lowest concentration detected malathion residues in water being 1 μg·ml-1. Kernel partial least squares-discriminant analysis was able to have a good performance in classifying data in nonlinear systems. It was inferred that Near-infrared spectroscopy coupled with the kernel partial least squares-discriminant analysis had a potential in rapid screening other pesticide residues in water.
基金supported by grants from the National Program on the Development of Basic Research (2011CB100100)the Priority Academic Program Development of Jiangsu Higher Education Institutions, the National Natural Science Foundations (31391632, 31200943, 31171187, and 91535103)+3 种基金the National High-tech R&D Program (863 Program) (2014AA10A601-5)the Natural Science Foundations of Jiangsu Province (BK20150010)the Natural Science Foundation of the Jiangsu Higher Education Institutions (14KJA210005)the Innovative Research Team of Universities in Jiangsu Province (KYLX_1352)
文摘Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.
文摘Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
基金financial supports from National Natural Science Foundation of China(No.62205172)Huaneng Group Science and Technology Research Project(No.HNKJ22-H105)Tsinghua University Initiative Scientific Research Program and the International Joint Mission on Climate Change and Carbon Neutrality。
文摘Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.
基金supported by National Key Scientific Instrument and Equipment Development Project of China,Grant Nos.2013YQ220643the National 863 Program of China,Grant Nos.2014AA06A503.
文摘As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.
文摘The Laser Induced Breakdown Spectroscopy (LIBS) is a fast, non-contact, no sample preparation analytic technology;it is very suitable for on-line analysis of alloy composition. In the copper smelting industry, analysis and control of the copper alloy concentration affect the quality of the products greatly, so LIBS is an efficient quantitative analysis tech- nology in the copper smelting industry. But for the lead brass, the components of Pb, Al and Ni elements are very low and the atomic emission lines are easily submerged under copper complex characteristic spectral lines because of the matrix effects. So it is difficult to get the online quantitative result of these important elements. In this paper, both the partial least squares (PLS) method and the calibration curve (CC) method are used to quantitatively analyze the laser induced breakdown spectroscopy data which is obtained from the standard lead brass alloy samples. Both the major and trace elements were quantitatively analyzed. By comparing the two results of the different calibration method, some useful results were obtained: both for major and trace elements, the PLS method was better than the CC method in quantitative analysis. And the regression coefficient of PLS method is compared with the original spectral data with background interference to explain the advantage of the PLS method in the LIBS quantitative analysis. Results proved that the PLS method used in laser induced breakdown spectroscopy was suitable for simultaneous quantitative analysis of different content elements in copper smelting industry.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
基金supported by the National Natural Science Foundation of China(Nos.21205145,21276006,21036009)the Open Funds of State Key Laboratory of Chemo/Biosensing and Chemometrics of Hunan University(No.201111)+1 种基金the Special Fund for Basic Scientific Research of Central Colleges,South-Central University for Nationalities(Nos.CZZ10005 and CZQ11012)the 'Five-twelfth' National Science and Technology Support Program (No.2012BAI27B00)
文摘Rapid and sensitive recognition of herbal pieces according to different concocted processing is crucial to quality control and pharmaceutical effect. Near-infrared (NIR) and mid-infrared (MIR) technology combined with supervised pattern recognition based on partial least-squares discriminant analysis (PLSDA) was attempted to classify and recognize six different concocted processing pieces of 600 Areca catechu L. samples and the influence of fingerprint information preprocessing methods on recognition performance was also investigated in this work. Recognition rates of 99.24%, 100% and 99.49% for original fingerprint, multiple scatter correct (MSC) fingerprint and second derivative (2nd derivative) fingerprint of NIR spectra were achieved by PLSDA models, respectively. Meanwhile, a perfect recognition rate of 100% was obtained for the above three fingerprint models of MIR spectra. In conclusion, PLSDA can rapidly and effectively extract otherness of fingerprint information from NIR and MIR spectra to identify different concocted herbal pieces ofA. catechu.
文摘Partial least squares discriminant analysis (PLS-DA) with integrated moving-window (MW) waveband screening was applied to the discriminant analysis of liquor brands with near-infrared (NIR) spectroscopy. Luzhou Laojiao, a popular liquor with strong fragrant flavor, was used as the identified liquor brand (160 samples, negative, 52 vol alcoholicity). Liquors of 10 other brands with strong fragrant flavor were used as the interferential brands (200 samples, positive, 52 vol alcoholicity). The Kennard-Stone algorithm was used for the division of modeling samples to achieve uniformity and representativeness. Based on the MW-PLS-DA, a simplified optimal model set with 157 wavebands was further proposed. This set contained five types of wavebands corresponding to the NIR absorption bands of water, ethanol, and other micronutrients (i.e., acids, aldehydes, phenols, and aromatic compounds) in liquor for practical choice. Using five selected simple models with 4775 - 4239, 7804 - 6569, 6264 - 5844, 9435 - 7896, and 12066 - 10373 cm-1, the validation recognition rates were obtained as 99.3% or higher. Results show good prediction performance and low model complexity, and also provided a valuable reference for designing small dedicated instruments. The proposed method is a promising tool for large-scale inspection of liquor food safety.
文摘High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent. Brand identification of wine is difficult and complex because of high similarity. In this paper, visible and near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was used to explore the feasibility of wine brand identification. Chilean Aoyo wine (2016 vintage) was selected as the identification brand (negative, 100 samples), and various other brands of wine were used as interference brands (positive, 373 samples). Samples of each type were randomly divided into the calibration, prediction and validation sets. For comparison, the PLS-DA models were established in three independent and two complex wavebands of visible (400 - 780 nm), short-NIR (780 - 1100 nm), long-NIR (1100 - 2498 nm), whole NIR (780 - 2498 nm) and whole scanning (400 - 2498 nm). In independent validation, the five models all achieved good discriminant effects. Among them, the visible region model achieved the best effect. The recognition-accuracy rates in validation of negative, positive and total samples achieved 100%, 95.6% and 97.5%, respectively. The results indicated the feasibility of wine brand identification with Vis-NIR spectroscopy.
基金Supported by the National Natural Science Foundation of China(No.30725045)Shanghai Leading Academic Discipline Project (No.B906)in part by the Scientific Foundation of Shanghai China(Nos.07DZ19728, 06DZ19717 and 06DZ19005)
文摘For quality control purpose, an approach of fingerprinting and simultaneous quantification of five major bioactive constituents of Rhizoma Coptidis was established via a high-performance liquid chromatograph coupled with a photodiode array UV detector(HPLC-DAD) and an electrospray ionization mass spectrometer(HPLC-ESI/MS) The compounds were identified on the basis of the comparison of their mass spectra with literature data and those of standard samples and quantified by the HPLC-DAD method. Baseline separation was achieved on an XTerra C18 column(5 μm, 250 mm×4.6 mm i. d.) with linear gradient elution of formate buffer(consisting of 0.5% formic acid, adjusted to pH=4.5 with ammonia) and acetonitrile(consisting of 0.2% formic acid and 0.2% triethylamine). The me- thod was validated for linearity(r^2〉0.9995), repeatability(RSD〈3.1%), intra- and inter-day precision(RSD〈1.8%) with recovery(99.9%-105.1%), limits of detection(0.15-0.35 μg/mL), and limits of quantification(0.53-0.82 μg/mL). The similarities of 32 batches of Rhizoma Coptidis and their classification according to their manufacturers were based on the retention time and peak areas of the characteristic compounds. The five compounds were selected for quality assessment ofRhizoma coptidis via partial least squares analysis(PLS).
基金Key Research and Development Program of Anhui Province(No.201904a07020073)Science and Technology Foundation of Electronic Test&Measurement Laboratory(No.6142001180307)National Basic Research Program(No.JSJL2018210C003)。
文摘As one of the important indicators of spectrometer,signal-to-noise ratio(SNR)reflects the ability of spectrometer to detect weak signals.To investigate the influence of SNR on the prediction accuracy of spectral analysis,we first introduce the major factors affecting the spectral SNR.Taking green tea as an example,the influence of spectral SNR on the prediction accuracy of the origin identification model is analyzed by experiments.At the same time,the relationship between the spectral SNR and prediction accuracy of spectral analysis model is fitted.Based on this,the common methods for improving the spectral SNR are discussed.The results show that the accuracy of the prediction set model first decreases slowly,then decreases linearly,and finally tends to be flat as the spectral SNR decreases.Through calculation,in order to achieve the prediction accuracy of prediction model reaching 90%and 85%,the spectral SNR is required to be higher than 23.42 dB and 21.16 dB,respectively.The overall results provide certain parameters support for the development of new online analytical spectroscopic instruments,especially for the technical indicators of SNR.
文摘The identification of soy sauce adulteration can avoid fraud, and protect the rights and interests of producers and consumers. Based on two measurement models (1 mm, 10 mm), the visible and near-infrared (Vis-NIR) spectroscopy combined with standard normal variate-partial least squares-discriminant analysis (SNV-PLS-DA) was used to establish the discriminant analysis models for adulterated and brewed soy sauces. Chubang soy sauce was selected as an identification brand (negative, 70). The adulteration samples (positive, 72) were prepared by mixing Chubang soy sauce and blended soy sauce with different adulteration rates. Among them, the “blended soy sauce” sample was concocted of salt water (NaCl), monosodium glutamate (C<sub>5</sub>H<sub>10</sub>NNaO<sub>5</sub>) and caramel color (C<sub>6</sub>H<sub>8</sub>O<sub>3</sub>). The rigorous calibration-prediction-validation sample design was adopted. For the case of 1 mm, five waveband models (visible, short-NIR, long-NIR, whole NIR and whole scanning regions) were established respectively;in the case of 10 mm, three waveband models (visible, short-NIR and visible-short-NIR regions) for unsaturated absorption were also established respectively. In independent validation, the models of all wavebands in the cases of 1 mm and 10 mm have achieved good discrimination effects. For the case of 1 mm, the visible model achieved the optimal validation effect, the validation recognition-accuracy rate (RAR<sub>V</sub>) was 99.6%;while in the case of 10 mm, both the visible and visible-short-NIR models achieved the optimal validation effect (RAR<sub>V</sub> = 100%). The detection method does not require reagents and is fast and simple, which is easy to promote the application. The results can provide valuable reference for designing small dedicated spectrometers with different measurement modals and different spectral regions.