Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi...Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more acc...Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.展开更多
The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for t...The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.展开更多
Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance up...Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classi...Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classification rate of kernel partial least squares-discriminant analysis was 100% for training set, and 100% for test set, with the lowest concentration detected malathion residues in water being 1 μg·ml-1. Kernel partial least squares-discriminant analysis was able to have a good performance in classifying data in nonlinear systems. It was inferred that Near-infrared spectroscopy coupled with the kernel partial least squares-discriminant analysis had a potential in rapid screening other pesticide residues in water.展开更多
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a...Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.展开更多
The Laser Induced Breakdown Spectroscopy (LIBS) is a fast, non-contact, no sample preparation analytic technology;it is very suitable for on-line analysis of alloy composition. In the copper smelting industry, analysi...The Laser Induced Breakdown Spectroscopy (LIBS) is a fast, non-contact, no sample preparation analytic technology;it is very suitable for on-line analysis of alloy composition. In the copper smelting industry, analysis and control of the copper alloy concentration affect the quality of the products greatly, so LIBS is an efficient quantitative analysis tech- nology in the copper smelting industry. But for the lead brass, the components of Pb, Al and Ni elements are very low and the atomic emission lines are easily submerged under copper complex characteristic spectral lines because of the matrix effects. So it is difficult to get the online quantitative result of these important elements. In this paper, both the partial least squares (PLS) method and the calibration curve (CC) method are used to quantitatively analyze the laser induced breakdown spectroscopy data which is obtained from the standard lead brass alloy samples. Both the major and trace elements were quantitatively analyzed. By comparing the two results of the different calibration method, some useful results were obtained: both for major and trace elements, the PLS method was better than the CC method in quantitative analysis. And the regression coefficient of PLS method is compared with the original spectral data with background interference to explain the advantage of the PLS method in the LIBS quantitative analysis. Results proved that the PLS method used in laser induced breakdown spectroscopy was suitable for simultaneous quantitative analysis of different content elements in copper smelting industry.展开更多
为研究不同养殖方式下宁都黄鸡肌肉关键挥发性风味物质,将试验鸡随机分为笼养组和平养组,饲喂同一日粮。试验鸡达上市日龄时对鸡肉进行感官品尝评价和挥发性风味物质检测,并采用正交偏最小二乘-判别分析(orthogonal partial least squar...为研究不同养殖方式下宁都黄鸡肌肉关键挥发性风味物质,将试验鸡随机分为笼养组和平养组,饲喂同一日粮。试验鸡达上市日龄时对鸡肉进行感官品尝评价和挥发性风味物质检测,并采用正交偏最小二乘-判别分析(orthogonal partial least squares-discriminant analysis,OPLS-DA)方法筛选与不同养殖方式相关的差异性风味物质。结果表明:平养组和笼养组共有的挥发性风味物质27种,主要为酚类、醇类和烃类。挥发性风味物质中,己醛、1-辛烯-3-醇、E-2-壬烯醛、正己醇、壬醛、2,3-戊二酮、癸醛、2,3-辛二酮、E-2-辛烯醛为具有显著性差异的挥发性风味物质。综上,这一研究可为地方鸡肉品质基于风味物质的评价提供科学依据。展开更多
As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring sys...As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
Breast cancer is one of the malignant tumors having high incidence in women,the incidence of breast cancer has increased in all parts of the world since twentieth century,but its etiology is not yet completely clear,s...Breast cancer is one of the malignant tumors having high incidence in women,the incidence of breast cancer has increased in all parts of the world since twentieth century,but its etiology is not yet completely clear,so it is very important to detect breast cells.In this paper,we built a regression model to detect breast cells,and generated a method for predicting the formation of benign and malignant breast cells by training the model,then we used the 10 features of breast cells to predict it,the results reaching upto 93.67%accuracy,it was very effective to predict and analyse whether the breast cells getting cancer,It had an important role in the diagnosis and prevention of breast cancer.展开更多
Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect ...Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect the disease damage of powdery mildew on leaf level by means of the hyperspectral measurements, particularly using the continuous wavelet analysis. In May 2010, the reflectance spectra and the biochemical properties were measured for 114 leaf samples with various disease severity degrees. A hyperspectral imaging system was also employed for obtaining detailed hyperspectral information of the normal and the pustule areas within one diseased leaf. Based on these spectra data, a continuous wavelet analysis (CWA) was carried out in conjunction with a correlation analysis, which generated a so-called correlation scalogram that summarizes the correlations between disease severity and the wavelet power at different wavelengths and decomposition scales. By using a thresholding approach, seven wavelet features were isolated for developing models in determining disease severity. In addition, 22 conventional spectral features (SFs) were also tested and compared with wavelet features for their efficiency in estimating disease severity. The multivariate linear regression (MLR) analysis and the partial least square regression (PLSR) analysis were adopted as training methods in model mildew on leaf level were found to be closely related with the development. The spectral characteristics of the powdery spectral characteristics of the pustule area and the content of chlorophyll. The wavelet features performed better than the conventional SFs in capturing this spectral change. Moreover, the regression model composed by seven wavelet features outperformed (R2=0.77, relative root mean square error RRMSE=0.28) the model composed by 14 optimal conventional SFs (R2---0.69, RRMSE--0.32) in estimating the disease severity. The PLSR method yielded a higher accuracy than the MLR method. A combination of CWA and PLSR was found to be promising in providing relatively accurate estimates of disease severity of powdery mildew on leaf level.展开更多
An experimental setup has been designed and realized in order to optimize the characteristics of laser-induced breakdown spectroscopy system working in various pressure environments. An approach combined the normaliza...An experimental setup has been designed and realized in order to optimize the characteristics of laser-induced breakdown spectroscopy system working in various pressure environments. An approach combined the normalization methods with the partial least squares(PLS) method are developed for quantitative analysis of molybdenum(Mo) element in the multi-component alloy,which is the first wall material in the Experimental Advanced Superconducting Tokamak. In this study, the different spectral normalization methods(total spectral area normalization,background normalization, and reference line normalization) are investigated for reducing the uncertainty and improving the accuracy of spectral measurement. The results indicates that the approach of PLS based on inter-element interference is significantly better than the conventional PLS methods as well as the univariate linear methods in the various pressure for molybdenum element analysis.展开更多
Partial least squares discriminant analysis (PLS-DA) with integrated moving-window (MW) waveband screening was applied to the discriminant analysis of liquor brands with near-infrared (NIR) spectroscopy. Luzhou Laojia...Partial least squares discriminant analysis (PLS-DA) with integrated moving-window (MW) waveband screening was applied to the discriminant analysis of liquor brands with near-infrared (NIR) spectroscopy. Luzhou Laojiao, a popular liquor with strong fragrant flavor, was used as the identified liquor brand (160 samples, negative, 52 vol alcoholicity). Liquors of 10 other brands with strong fragrant flavor were used as the interferential brands (200 samples, positive, 52 vol alcoholicity). The Kennard-Stone algorithm was used for the division of modeling samples to achieve uniformity and representativeness. Based on the MW-PLS-DA, a simplified optimal model set with 157 wavebands was further proposed. This set contained five types of wavebands corresponding to the NIR absorption bands of water, ethanol, and other micronutrients (i.e., acids, aldehydes, phenols, and aromatic compounds) in liquor for practical choice. Using five selected simple models with 4775 - 4239, 7804 - 6569, 6264 - 5844, 9435 - 7896, and 12066 - 10373 cm-1, the validation recognition rates were obtained as 99.3% or higher. Results show good prediction performance and low model complexity, and also provided a valuable reference for designing small dedicated instruments. The proposed method is a promising tool for large-scale inspection of liquor food safety.展开更多
For quality control purpose, an approach of fingerprinting and simultaneous quantification of five major bioactive constituents of Rhizoma Coptidis was established via a high-performance liquid chromatograph coupled w...For quality control purpose, an approach of fingerprinting and simultaneous quantification of five major bioactive constituents of Rhizoma Coptidis was established via a high-performance liquid chromatograph coupled with a photodiode array UV detector(HPLC-DAD) and an electrospray ionization mass spectrometer(HPLC-ESI/MS) The compounds were identified on the basis of the comparison of their mass spectra with literature data and those of standard samples and quantified by the HPLC-DAD method. Baseline separation was achieved on an XTerra C18 column(5 μm, 250 mm×4.6 mm i. d.) with linear gradient elution of formate buffer(consisting of 0.5% formic acid, adjusted to pH=4.5 with ammonia) and acetonitrile(consisting of 0.2% formic acid and 0.2% triethylamine). The me- thod was validated for linearity(r^2〉0.9995), repeatability(RSD〈3.1%), intra- and inter-day precision(RSD〈1.8%) with recovery(99.9%-105.1%), limits of detection(0.15-0.35 μg/mL), and limits of quantification(0.53-0.82 μg/mL). The similarities of 32 batches of Rhizoma Coptidis and their classification according to their manufacturers were based on the retention time and peak areas of the characteristic compounds. The five compounds were selected for quality assessment ofRhizoma coptidis via partial least squares analysis(PLS).展开更多
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金supported by grants from the National Program on the Development of Basic Research (2011CB100100)the Priority Academic Program Development of Jiangsu Higher Education Institutions, the National Natural Science Foundations (31391632, 31200943, 31171187, and 91535103)+3 种基金the National High-tech R&D Program (863 Program) (2014AA10A601-5)the Natural Science Foundations of Jiangsu Province (BK20150010)the Natural Science Foundation of the Jiangsu Higher Education Institutions (14KJA210005)the Innovative Research Team of Universities in Jiangsu Province (KYLX_1352)
文摘Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.
文摘The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.
文摘Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
文摘Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classification rate of kernel partial least squares-discriminant analysis was 100% for training set, and 100% for test set, with the lowest concentration detected malathion residues in water being 1 μg·ml-1. Kernel partial least squares-discriminant analysis was able to have a good performance in classifying data in nonlinear systems. It was inferred that Near-infrared spectroscopy coupled with the kernel partial least squares-discriminant analysis had a potential in rapid screening other pesticide residues in water.
基金financial supports from National Natural Science Foundation of China(No.62205172)Huaneng Group Science and Technology Research Project(No.HNKJ22-H105)Tsinghua University Initiative Scientific Research Program and the International Joint Mission on Climate Change and Carbon Neutrality。
文摘Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.
文摘The Laser Induced Breakdown Spectroscopy (LIBS) is a fast, non-contact, no sample preparation analytic technology;it is very suitable for on-line analysis of alloy composition. In the copper smelting industry, analysis and control of the copper alloy concentration affect the quality of the products greatly, so LIBS is an efficient quantitative analysis tech- nology in the copper smelting industry. But for the lead brass, the components of Pb, Al and Ni elements are very low and the atomic emission lines are easily submerged under copper complex characteristic spectral lines because of the matrix effects. So it is difficult to get the online quantitative result of these important elements. In this paper, both the partial least squares (PLS) method and the calibration curve (CC) method are used to quantitatively analyze the laser induced breakdown spectroscopy data which is obtained from the standard lead brass alloy samples. Both the major and trace elements were quantitatively analyzed. By comparing the two results of the different calibration method, some useful results were obtained: both for major and trace elements, the PLS method was better than the CC method in quantitative analysis. And the regression coefficient of PLS method is compared with the original spectral data with background interference to explain the advantage of the PLS method in the LIBS quantitative analysis. Results proved that the PLS method used in laser induced breakdown spectroscopy was suitable for simultaneous quantitative analysis of different content elements in copper smelting industry.
文摘为研究不同养殖方式下宁都黄鸡肌肉关键挥发性风味物质,将试验鸡随机分为笼养组和平养组,饲喂同一日粮。试验鸡达上市日龄时对鸡肉进行感官品尝评价和挥发性风味物质检测,并采用正交偏最小二乘-判别分析(orthogonal partial least squares-discriminant analysis,OPLS-DA)方法筛选与不同养殖方式相关的差异性风味物质。结果表明:平养组和笼养组共有的挥发性风味物质27种,主要为酚类、醇类和烃类。挥发性风味物质中,己醛、1-辛烯-3-醇、E-2-壬烯醛、正己醇、壬醛、2,3-戊二酮、癸醛、2,3-辛二酮、E-2-辛烯醛为具有显著性差异的挥发性风味物质。综上,这一研究可为地方鸡肉品质基于风味物质的评价提供科学依据。
基金supported by National Key Scientific Instrument and Equipment Development Project of China,Grant Nos.2013YQ220643the National 863 Program of China,Grant Nos.2014AA06A503.
文摘As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
文摘Breast cancer is one of the malignant tumors having high incidence in women,the incidence of breast cancer has increased in all parts of the world since twentieth century,but its etiology is not yet completely clear,so it is very important to detect breast cells.In this paper,we built a regression model to detect breast cells,and generated a method for predicting the formation of benign and malignant breast cells by training the model,then we used the 10 features of breast cells to predict it,the results reaching upto 93.67%accuracy,it was very effective to predict and analyse whether the breast cells getting cancer,It had an important role in the diagnosis and prevention of breast cancer.
基金the National Natural Science Foundation of China (41101395, 41071276, 31071324)the Beijing Municipal Natural Science Foundation, China (4122032)the National Basic Research Program of China (2011CB311806)
文摘Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect the disease damage of powdery mildew on leaf level by means of the hyperspectral measurements, particularly using the continuous wavelet analysis. In May 2010, the reflectance spectra and the biochemical properties were measured for 114 leaf samples with various disease severity degrees. A hyperspectral imaging system was also employed for obtaining detailed hyperspectral information of the normal and the pustule areas within one diseased leaf. Based on these spectra data, a continuous wavelet analysis (CWA) was carried out in conjunction with a correlation analysis, which generated a so-called correlation scalogram that summarizes the correlations between disease severity and the wavelet power at different wavelengths and decomposition scales. By using a thresholding approach, seven wavelet features were isolated for developing models in determining disease severity. In addition, 22 conventional spectral features (SFs) were also tested and compared with wavelet features for their efficiency in estimating disease severity. The multivariate linear regression (MLR) analysis and the partial least square regression (PLSR) analysis were adopted as training methods in model mildew on leaf level were found to be closely related with the development. The spectral characteristics of the powdery spectral characteristics of the pustule area and the content of chlorophyll. The wavelet features performed better than the conventional SFs in capturing this spectral change. Moreover, the regression model composed by seven wavelet features outperformed (R2=0.77, relative root mean square error RRMSE=0.28) the model composed by 14 optimal conventional SFs (R2---0.69, RRMSE--0.32) in estimating the disease severity. The PLSR method yielded a higher accuracy than the MLR method. A combination of CWA and PLSR was found to be promising in providing relatively accurate estimates of disease severity of powdery mildew on leaf level.
基金supported by the National Magnetic Confinement Fusion Science Program of China (No. 2017YFE0301304)National Natural Science Foundation of China (Nos. 11 475 039, 11 605 023, 11 705 020)+2 种基金China Postdoctoral Science Foundation (Nos. 2016M591423, 2017T100172, 2018M630285)the Fundamental Research Funds for the Central Universities (Nos. DUT15RC(3)072, DUT17RC(4)53, DUT18LK38)Liaoning Provincial Natural Science Foundation of China (No. 20 170 540 153)
文摘An experimental setup has been designed and realized in order to optimize the characteristics of laser-induced breakdown spectroscopy system working in various pressure environments. An approach combined the normalization methods with the partial least squares(PLS) method are developed for quantitative analysis of molybdenum(Mo) element in the multi-component alloy,which is the first wall material in the Experimental Advanced Superconducting Tokamak. In this study, the different spectral normalization methods(total spectral area normalization,background normalization, and reference line normalization) are investigated for reducing the uncertainty and improving the accuracy of spectral measurement. The results indicates that the approach of PLS based on inter-element interference is significantly better than the conventional PLS methods as well as the univariate linear methods in the various pressure for molybdenum element analysis.
文摘Partial least squares discriminant analysis (PLS-DA) with integrated moving-window (MW) waveband screening was applied to the discriminant analysis of liquor brands with near-infrared (NIR) spectroscopy. Luzhou Laojiao, a popular liquor with strong fragrant flavor, was used as the identified liquor brand (160 samples, negative, 52 vol alcoholicity). Liquors of 10 other brands with strong fragrant flavor were used as the interferential brands (200 samples, positive, 52 vol alcoholicity). The Kennard-Stone algorithm was used for the division of modeling samples to achieve uniformity and representativeness. Based on the MW-PLS-DA, a simplified optimal model set with 157 wavebands was further proposed. This set contained five types of wavebands corresponding to the NIR absorption bands of water, ethanol, and other micronutrients (i.e., acids, aldehydes, phenols, and aromatic compounds) in liquor for practical choice. Using five selected simple models with 4775 - 4239, 7804 - 6569, 6264 - 5844, 9435 - 7896, and 12066 - 10373 cm-1, the validation recognition rates were obtained as 99.3% or higher. Results show good prediction performance and low model complexity, and also provided a valuable reference for designing small dedicated instruments. The proposed method is a promising tool for large-scale inspection of liquor food safety.
基金Supported by the National Natural Science Foundation of China(No.30725045)Shanghai Leading Academic Discipline Project (No.B906)in part by the Scientific Foundation of Shanghai China(Nos.07DZ19728, 06DZ19717 and 06DZ19005)
文摘For quality control purpose, an approach of fingerprinting and simultaneous quantification of five major bioactive constituents of Rhizoma Coptidis was established via a high-performance liquid chromatograph coupled with a photodiode array UV detector(HPLC-DAD) and an electrospray ionization mass spectrometer(HPLC-ESI/MS) The compounds were identified on the basis of the comparison of their mass spectra with literature data and those of standard samples and quantified by the HPLC-DAD method. Baseline separation was achieved on an XTerra C18 column(5 μm, 250 mm×4.6 mm i. d.) with linear gradient elution of formate buffer(consisting of 0.5% formic acid, adjusted to pH=4.5 with ammonia) and acetonitrile(consisting of 0.2% formic acid and 0.2% triethylamine). The me- thod was validated for linearity(r^2〉0.9995), repeatability(RSD〈3.1%), intra- and inter-day precision(RSD〈1.8%) with recovery(99.9%-105.1%), limits of detection(0.15-0.35 μg/mL), and limits of quantification(0.53-0.82 μg/mL). The similarities of 32 batches of Rhizoma Coptidis and their classification according to their manufacturers were based on the retention time and peak areas of the characteristic compounds. The five compounds were selected for quality assessment ofRhizoma coptidis via partial least squares analysis(PLS).