Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi...Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more acc...Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.展开更多
The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for t...The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.展开更多
The Laser Induced Breakdown Spectroscopy (LIBS) is a fast, non-contact, no sample preparation analytic technology;it is very suitable for on-line analysis of alloy composition. In the copper smelting industry, analysi...The Laser Induced Breakdown Spectroscopy (LIBS) is a fast, non-contact, no sample preparation analytic technology;it is very suitable for on-line analysis of alloy composition. In the copper smelting industry, analysis and control of the copper alloy concentration affect the quality of the products greatly, so LIBS is an efficient quantitative analysis tech- nology in the copper smelting industry. But for the lead brass, the components of Pb, Al and Ni elements are very low and the atomic emission lines are easily submerged under copper complex characteristic spectral lines because of the matrix effects. So it is difficult to get the online quantitative result of these important elements. In this paper, both the partial least squares (PLS) method and the calibration curve (CC) method are used to quantitatively analyze the laser induced breakdown spectroscopy data which is obtained from the standard lead brass alloy samples. Both the major and trace elements were quantitatively analyzed. By comparing the two results of the different calibration method, some useful results were obtained: both for major and trace elements, the PLS method was better than the CC method in quantitative analysis. And the regression coefficient of PLS method is compared with the original spectral data with background interference to explain the advantage of the PLS method in the LIBS quantitative analysis. Results proved that the PLS method used in laser induced breakdown spectroscopy was suitable for simultaneous quantitative analysis of different content elements in copper smelting industry.展开更多
Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance up...Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classi...Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classification rate of kernel partial least squares-discriminant analysis was 100% for training set, and 100% for test set, with the lowest concentration detected malathion residues in water being 1 μg·ml-1. Kernel partial least squares-discriminant analysis was able to have a good performance in classifying data in nonlinear systems. It was inferred that Near-infrared spectroscopy coupled with the kernel partial least squares-discriminant analysis had a potential in rapid screening other pesticide residues in water.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring sys...As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.展开更多
Breast cancer is one of the malignant tumors having high incidence in women,the incidence of breast cancer has increased in all parts of the world since twentieth century,but its etiology is not yet completely clear,s...Breast cancer is one of the malignant tumors having high incidence in women,the incidence of breast cancer has increased in all parts of the world since twentieth century,but its etiology is not yet completely clear,so it is very important to detect breast cells.In this paper,we built a regression model to detect breast cells,and generated a method for predicting the formation of benign and malignant breast cells by training the model,then we used the 10 features of breast cells to predict it,the results reaching upto 93.67%accuracy,it was very effective to predict and analyse whether the breast cells getting cancer,It had an important role in the diagnosis and prevention of breast cancer.展开更多
Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect ...Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect the disease damage of powdery mildew on leaf level by means of the hyperspectral measurements, particularly using the continuous wavelet analysis. In May 2010, the reflectance spectra and the biochemical properties were measured for 114 leaf samples with various disease severity degrees. A hyperspectral imaging system was also employed for obtaining detailed hyperspectral information of the normal and the pustule areas within one diseased leaf. Based on these spectra data, a continuous wavelet analysis (CWA) was carried out in conjunction with a correlation analysis, which generated a so-called correlation scalogram that summarizes the correlations between disease severity and the wavelet power at different wavelengths and decomposition scales. By using a thresholding approach, seven wavelet features were isolated for developing models in determining disease severity. In addition, 22 conventional spectral features (SFs) were also tested and compared with wavelet features for their efficiency in estimating disease severity. The multivariate linear regression (MLR) analysis and the partial least square regression (PLSR) analysis were adopted as training methods in model mildew on leaf level were found to be closely related with the development. The spectral characteristics of the powdery spectral characteristics of the pustule area and the content of chlorophyll. The wavelet features performed better than the conventional SFs in capturing this spectral change. Moreover, the regression model composed by seven wavelet features outperformed (R2=0.77, relative root mean square error RRMSE=0.28) the model composed by 14 optimal conventional SFs (R2---0.69, RRMSE--0.32) in estimating the disease severity. The PLSR method yielded a higher accuracy than the MLR method. A combination of CWA and PLSR was found to be promising in providing relatively accurate estimates of disease severity of powdery mildew on leaf level.展开更多
An experimental setup has been designed and realized in order to optimize the characteristics of laser-induced breakdown spectroscopy system working in various pressure environments. An approach combined the normaliza...An experimental setup has been designed and realized in order to optimize the characteristics of laser-induced breakdown spectroscopy system working in various pressure environments. An approach combined the normalization methods with the partial least squares(PLS) method are developed for quantitative analysis of molybdenum(Mo) element in the multi-component alloy,which is the first wall material in the Experimental Advanced Superconducting Tokamak. In this study, the different spectral normalization methods(total spectral area normalization,background normalization, and reference line normalization) are investigated for reducing the uncertainty and improving the accuracy of spectral measurement. The results indicates that the approach of PLS based on inter-element interference is significantly better than the conventional PLS methods as well as the univariate linear methods in the various pressure for molybdenum element analysis.展开更多
Partial least squares discriminant analysis (PLS-DA) with integrated moving-window (MW) waveband screening was applied to the discriminant analysis of liquor brands with near-infrared (NIR) spectroscopy. Luzhou Laojia...Partial least squares discriminant analysis (PLS-DA) with integrated moving-window (MW) waveband screening was applied to the discriminant analysis of liquor brands with near-infrared (NIR) spectroscopy. Luzhou Laojiao, a popular liquor with strong fragrant flavor, was used as the identified liquor brand (160 samples, negative, 52 vol alcoholicity). Liquors of 10 other brands with strong fragrant flavor were used as the interferential brands (200 samples, positive, 52 vol alcoholicity). The Kennard-Stone algorithm was used for the division of modeling samples to achieve uniformity and representativeness. Based on the MW-PLS-DA, a simplified optimal model set with 157 wavebands was further proposed. This set contained five types of wavebands corresponding to the NIR absorption bands of water, ethanol, and other micronutrients (i.e., acids, aldehydes, phenols, and aromatic compounds) in liquor for practical choice. Using five selected simple models with 4775 - 4239, 7804 - 6569, 6264 - 5844, 9435 - 7896, and 12066 - 10373 cm-1, the validation recognition rates were obtained as 99.3% or higher. Results show good prediction performance and low model complexity, and also provided a valuable reference for designing small dedicated instruments. The proposed method is a promising tool for large-scale inspection of liquor food safety.展开更多
For quality control purpose, an approach of fingerprinting and simultaneous quantification of five major bioactive constituents of Rhizoma Coptidis was established via a high-performance liquid chromatograph coupled w...For quality control purpose, an approach of fingerprinting and simultaneous quantification of five major bioactive constituents of Rhizoma Coptidis was established via a high-performance liquid chromatograph coupled with a photodiode array UV detector(HPLC-DAD) and an electrospray ionization mass spectrometer(HPLC-ESI/MS) The compounds were identified on the basis of the comparison of their mass spectra with literature data and those of standard samples and quantified by the HPLC-DAD method. Baseline separation was achieved on an XTerra C18 column(5 μm, 250 mm×4.6 mm i. d.) with linear gradient elution of formate buffer(consisting of 0.5% formic acid, adjusted to pH=4.5 with ammonia) and acetonitrile(consisting of 0.2% formic acid and 0.2% triethylamine). The me- thod was validated for linearity(r^2〉0.9995), repeatability(RSD〈3.1%), intra- and inter-day precision(RSD〈1.8%) with recovery(99.9%-105.1%), limits of detection(0.15-0.35 μg/mL), and limits of quantification(0.53-0.82 μg/mL). The similarities of 32 batches of Rhizoma Coptidis and their classification according to their manufacturers were based on the retention time and peak areas of the characteristic compounds. The five compounds were selected for quality assessment ofRhizoma coptidis via partial least squares analysis(PLS).展开更多
High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent. Brand id...High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent. Brand identification of wine is difficult and complex because of high similarity. In this paper, visible and near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was used to explore the feasibility of wine brand identification. Chilean Aoyo wine (2016 vintage) was selected as the identification brand (negative, 100 samples), and various other brands of wine were used as interference brands (positive, 373 samples). Samples of each type were randomly divided into the calibration, prediction and validation sets. For comparison, the PLS-DA models were established in three independent and two complex wavebands of visible (400 - 780 nm), short-NIR (780 - 1100 nm), long-NIR (1100 - 2498 nm), whole NIR (780 - 2498 nm) and whole scanning (400 - 2498 nm). In independent validation, the five models all achieved good discriminant effects. Among them, the visible region model achieved the best effect. The recognition-accuracy rates in validation of negative, positive and total samples achieved 100%, 95.6% and 97.5%, respectively. The results indicated the feasibility of wine brand identification with Vis-NIR spectroscopy.展开更多
In this study,multivariate analysis methods,including a principal component analysis(PCA)and partial least square(PLS)analysis,were applied to reveal the inner relationship of the key variables in the process of H_(2)...In this study,multivariate analysis methods,including a principal component analysis(PCA)and partial least square(PLS)analysis,were applied to reveal the inner relationship of the key variables in the process of H_(2)O_(2)-assisted Na_(2)CO_(3)(HSC)pretreatment of corn stover.A total of 120 pretreatment experiments were implemented at the lab scale under different conditions by varying the particle size of the corn stover and process variables.The results showed that the Na_(2)CO_(3) dosage and pretreatment temperature had a strong influence on lignin removal,whereas pulp refining instrument(PFI)refining and Na_(2)CO_(3) dosage played positive roles in the final total sugar yield.Furthermore,it was found that pretreatment conditions had a more significant impact on the amelioration of pretreatment effectiveness compared with the properties of raw corn stover.In addition,a prediction of the effectiveness of the corn stover HSC pretreatment based on a PLS analysis was conducted for the first time,and the test results of the predictability based on additional pretreatment experiments proved that the developed PLS model achieved a good predictive performance(particularly for the final total sugar yield),indicating that the developed PLS model can be used to predict the effectiveness of HSC pretreatment.Therefore,multivariate analysis can be potentially used to monitor and control the pretreatment process in future large-scale biorefinery applications.展开更多
The performance of different chemometric approaches was evaluated in the spectrophotometric determination of pharmaceutical mixtures characterized by having the amount of components with a very high ratio. Principal c...The performance of different chemometric approaches was evaluated in the spectrophotometric determination of pharmaceutical mixtures characterized by having the amount of components with a very high ratio. Principal component regression (PCR), partial least squares with one dependent variable (PLS1) or multi-dependent variables (PLS2), and multivariate curve resolution (MCR) were applied to the spectral data of a ternary mixture containing paracetamol, sodium ascorbate and chlorpheniramine (150:140:1, m/m/m), and a quaternary mixture containing paracetamol, caffeine, phenylephrine and chlorpheniramine (125:6. 25:1.25:1, m/m/m/m). The UV spectra of the calibration samples in the range of 200-320 nm were pre-treated by removing noise and useless data, and the wavelength regions having the most useful analytical information were selected using the regression coefficients calculated in the multivariate modeling. All the defined chemometric models were validated on external sample sets and then applied to commercial pharmaceutical formulations. Different data intervals, fixed at 0.5, 1.0, and 2.0 point/nm, were tested to optimize the prediction ability of the models. The best results were obtained using the PLSlcalibration models and the quantification of the species of a lower amount was sig- nificantly improved by adopting 0.5 data interval, which showed accuracy between 94.24% and 107.76%.展开更多
Near-infrared (NIR) spectroscopy combined with chemometrics methods was applied to the rapid and reagent-free analysis of serum urea nitrogen (SUN). The mul-partitions modeling was performed to achieve parameter stabi...Near-infrared (NIR) spectroscopy combined with chemometrics methods was applied to the rapid and reagent-free analysis of serum urea nitrogen (SUN). The mul-partitions modeling was performed to achieve parameter stability. A large-scale parameter cyclic and global optimization platform for Norris derivative filter (NDF) of three parameters (the derivative order: d, the number of smoothing points: s and the number of differential gaps: g) was developed with PLS regression. Meantime, the parameters’ adaptive analysis of NDF algorithm was also given, and achieved a significantly better modeling effect than one without spectral pre-processing. After eliminating the interference wavebands of saturated absorption, the modeling performance was further improved. In validation, the root mean square error (SEP), correlation coefficient (RP) for prediction and the ratio of performance to deviation (RPD) were 1.66 mmol?L-1, 0.966 and 4.7, respectively. The results showed that the high-precision analysis of SUN was feasibility based on NIR spectroscopy and Norris-PLS. The global optimization method of NDF is also expected to be applied to other analysis objects.展开更多
The optimal selection method of spectral region based on the grey correlation analysis was applied in the analysis of near-infrared(NIR) spectra. In order to compute "characteristic" spectral region, 160 samples o...The optimal selection method of spectral region based on the grey correlation analysis was applied in the analysis of near-infrared(NIR) spectra. In order to compute "characteristic" spectral region, 160 samples of tobacco were surveyed by NIR. Next, the whole spectral region was randomly divided into six regions, and the values of association coefficients and correlation orders of different regions were computed for total sugar, reducing sugar and nicotine. Moreover, two regions that owned the largest value of association coefficient were regarded as "characteristic" spectral region of a model. Finally, the quantitative analysis models of different components were established via the partial least squares method, and the common selection methods of spectral region were compared. The simulation results indicate that the models to choose the spectral region based on grey correlation analysis are more effective than the common selection methods of spectral region, the optimized time of algorithm is shorter, the prediction precision of the models is higher and generalization ability for quantitative analysis results is stronger. This research can provide the support for the quantitative analysis models of NIR spectra and new idea for commercial analysis software of NIR. So, it has a high application value in the analysis of NIR spectra.展开更多
In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous p...In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous polyps from hyperplastic polyps for the purpose of classification and interpretation. The classification performances of the two functional models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The results indicated that classification abilities of FPCA and FPLS models outperformed those of the PCDA and PLSDA models by using a small number of functional basis components. With substantial reduction in model complexity and improvement of classification accuracy, it is particularly helpful for interpretation of the complex spectral features related to precancerous colon polyps.展开更多
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金supported by grants from the National Program on the Development of Basic Research (2011CB100100)the Priority Academic Program Development of Jiangsu Higher Education Institutions, the National Natural Science Foundations (31391632, 31200943, 31171187, and 91535103)+3 种基金the National High-tech R&D Program (863 Program) (2014AA10A601-5)the Natural Science Foundations of Jiangsu Province (BK20150010)the Natural Science Foundation of the Jiangsu Higher Education Institutions (14KJA210005)the Innovative Research Team of Universities in Jiangsu Province (KYLX_1352)
文摘Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.
文摘The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.
文摘The Laser Induced Breakdown Spectroscopy (LIBS) is a fast, non-contact, no sample preparation analytic technology;it is very suitable for on-line analysis of alloy composition. In the copper smelting industry, analysis and control of the copper alloy concentration affect the quality of the products greatly, so LIBS is an efficient quantitative analysis tech- nology in the copper smelting industry. But for the lead brass, the components of Pb, Al and Ni elements are very low and the atomic emission lines are easily submerged under copper complex characteristic spectral lines because of the matrix effects. So it is difficult to get the online quantitative result of these important elements. In this paper, both the partial least squares (PLS) method and the calibration curve (CC) method are used to quantitatively analyze the laser induced breakdown spectroscopy data which is obtained from the standard lead brass alloy samples. Both the major and trace elements were quantitatively analyzed. By comparing the two results of the different calibration method, some useful results were obtained: both for major and trace elements, the PLS method was better than the CC method in quantitative analysis. And the regression coefficient of PLS method is compared with the original spectral data with background interference to explain the advantage of the PLS method in the LIBS quantitative analysis. Results proved that the PLS method used in laser induced breakdown spectroscopy was suitable for simultaneous quantitative analysis of different content elements in copper smelting industry.
文摘Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
文摘Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classification rate of kernel partial least squares-discriminant analysis was 100% for training set, and 100% for test set, with the lowest concentration detected malathion residues in water being 1 μg·ml-1. Kernel partial least squares-discriminant analysis was able to have a good performance in classifying data in nonlinear systems. It was inferred that Near-infrared spectroscopy coupled with the kernel partial least squares-discriminant analysis had a potential in rapid screening other pesticide residues in water.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
基金supported by National Key Scientific Instrument and Equipment Development Project of China,Grant Nos.2013YQ220643the National 863 Program of China,Grant Nos.2014AA06A503.
文摘As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.
文摘Breast cancer is one of the malignant tumors having high incidence in women,the incidence of breast cancer has increased in all parts of the world since twentieth century,but its etiology is not yet completely clear,so it is very important to detect breast cells.In this paper,we built a regression model to detect breast cells,and generated a method for predicting the formation of benign and malignant breast cells by training the model,then we used the 10 features of breast cells to predict it,the results reaching upto 93.67%accuracy,it was very effective to predict and analyse whether the breast cells getting cancer,It had an important role in the diagnosis and prevention of breast cancer.
基金the National Natural Science Foundation of China (41101395, 41071276, 31071324)the Beijing Municipal Natural Science Foundation, China (4122032)the National Basic Research Program of China (2011CB311806)
文摘Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect the disease damage of powdery mildew on leaf level by means of the hyperspectral measurements, particularly using the continuous wavelet analysis. In May 2010, the reflectance spectra and the biochemical properties were measured for 114 leaf samples with various disease severity degrees. A hyperspectral imaging system was also employed for obtaining detailed hyperspectral information of the normal and the pustule areas within one diseased leaf. Based on these spectra data, a continuous wavelet analysis (CWA) was carried out in conjunction with a correlation analysis, which generated a so-called correlation scalogram that summarizes the correlations between disease severity and the wavelet power at different wavelengths and decomposition scales. By using a thresholding approach, seven wavelet features were isolated for developing models in determining disease severity. In addition, 22 conventional spectral features (SFs) were also tested and compared with wavelet features for their efficiency in estimating disease severity. The multivariate linear regression (MLR) analysis and the partial least square regression (PLSR) analysis were adopted as training methods in model mildew on leaf level were found to be closely related with the development. The spectral characteristics of the powdery spectral characteristics of the pustule area and the content of chlorophyll. The wavelet features performed better than the conventional SFs in capturing this spectral change. Moreover, the regression model composed by seven wavelet features outperformed (R2=0.77, relative root mean square error RRMSE=0.28) the model composed by 14 optimal conventional SFs (R2---0.69, RRMSE--0.32) in estimating the disease severity. The PLSR method yielded a higher accuracy than the MLR method. A combination of CWA and PLSR was found to be promising in providing relatively accurate estimates of disease severity of powdery mildew on leaf level.
基金supported by the National Magnetic Confinement Fusion Science Program of China (No. 2017YFE0301304)National Natural Science Foundation of China (Nos. 11 475 039, 11 605 023, 11 705 020)+2 种基金China Postdoctoral Science Foundation (Nos. 2016M591423, 2017T100172, 2018M630285)the Fundamental Research Funds for the Central Universities (Nos. DUT15RC(3)072, DUT17RC(4)53, DUT18LK38)Liaoning Provincial Natural Science Foundation of China (No. 20 170 540 153)
文摘An experimental setup has been designed and realized in order to optimize the characteristics of laser-induced breakdown spectroscopy system working in various pressure environments. An approach combined the normalization methods with the partial least squares(PLS) method are developed for quantitative analysis of molybdenum(Mo) element in the multi-component alloy,which is the first wall material in the Experimental Advanced Superconducting Tokamak. In this study, the different spectral normalization methods(total spectral area normalization,background normalization, and reference line normalization) are investigated for reducing the uncertainty and improving the accuracy of spectral measurement. The results indicates that the approach of PLS based on inter-element interference is significantly better than the conventional PLS methods as well as the univariate linear methods in the various pressure for molybdenum element analysis.
文摘Partial least squares discriminant analysis (PLS-DA) with integrated moving-window (MW) waveband screening was applied to the discriminant analysis of liquor brands with near-infrared (NIR) spectroscopy. Luzhou Laojiao, a popular liquor with strong fragrant flavor, was used as the identified liquor brand (160 samples, negative, 52 vol alcoholicity). Liquors of 10 other brands with strong fragrant flavor were used as the interferential brands (200 samples, positive, 52 vol alcoholicity). The Kennard-Stone algorithm was used for the division of modeling samples to achieve uniformity and representativeness. Based on the MW-PLS-DA, a simplified optimal model set with 157 wavebands was further proposed. This set contained five types of wavebands corresponding to the NIR absorption bands of water, ethanol, and other micronutrients (i.e., acids, aldehydes, phenols, and aromatic compounds) in liquor for practical choice. Using five selected simple models with 4775 - 4239, 7804 - 6569, 6264 - 5844, 9435 - 7896, and 12066 - 10373 cm-1, the validation recognition rates were obtained as 99.3% or higher. Results show good prediction performance and low model complexity, and also provided a valuable reference for designing small dedicated instruments. The proposed method is a promising tool for large-scale inspection of liquor food safety.
基金Supported by the National Natural Science Foundation of China(No.30725045)Shanghai Leading Academic Discipline Project (No.B906)in part by the Scientific Foundation of Shanghai China(Nos.07DZ19728, 06DZ19717 and 06DZ19005)
文摘For quality control purpose, an approach of fingerprinting and simultaneous quantification of five major bioactive constituents of Rhizoma Coptidis was established via a high-performance liquid chromatograph coupled with a photodiode array UV detector(HPLC-DAD) and an electrospray ionization mass spectrometer(HPLC-ESI/MS) The compounds were identified on the basis of the comparison of their mass spectra with literature data and those of standard samples and quantified by the HPLC-DAD method. Baseline separation was achieved on an XTerra C18 column(5 μm, 250 mm×4.6 mm i. d.) with linear gradient elution of formate buffer(consisting of 0.5% formic acid, adjusted to pH=4.5 with ammonia) and acetonitrile(consisting of 0.2% formic acid and 0.2% triethylamine). The me- thod was validated for linearity(r^2〉0.9995), repeatability(RSD〈3.1%), intra- and inter-day precision(RSD〈1.8%) with recovery(99.9%-105.1%), limits of detection(0.15-0.35 μg/mL), and limits of quantification(0.53-0.82 μg/mL). The similarities of 32 batches of Rhizoma Coptidis and their classification according to their manufacturers were based on the retention time and peak areas of the characteristic compounds. The five compounds were selected for quality assessment ofRhizoma coptidis via partial least squares analysis(PLS).
文摘High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent. Brand identification of wine is difficult and complex because of high similarity. In this paper, visible and near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was used to explore the feasibility of wine brand identification. Chilean Aoyo wine (2016 vintage) was selected as the identification brand (negative, 100 samples), and various other brands of wine were used as interference brands (positive, 373 samples). Samples of each type were randomly divided into the calibration, prediction and validation sets. For comparison, the PLS-DA models were established in three independent and two complex wavebands of visible (400 - 780 nm), short-NIR (780 - 1100 nm), long-NIR (1100 - 2498 nm), whole NIR (780 - 2498 nm) and whole scanning (400 - 2498 nm). In independent validation, the five models all achieved good discriminant effects. Among them, the visible region model achieved the best effect. The recognition-accuracy rates in validation of negative, positive and total samples achieved 100%, 95.6% and 97.5%, respectively. The results indicated the feasibility of wine brand identification with Vis-NIR spectroscopy.
基金This work was financially supported by the National Natural Science Foundation of China(No.31870568)Shandong Provincial Natural Science Foundation for Distinguished Young Scholars(China)(No.ZR2019JQ10)+1 种基金the Major Program of the Shandong Province Natural Science Foundation(No.ZR2018ZB0208)the"Transformational Technologies for Clean Energy and Demonstration"Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDA21060201).
文摘In this study,multivariate analysis methods,including a principal component analysis(PCA)and partial least square(PLS)analysis,were applied to reveal the inner relationship of the key variables in the process of H_(2)O_(2)-assisted Na_(2)CO_(3)(HSC)pretreatment of corn stover.A total of 120 pretreatment experiments were implemented at the lab scale under different conditions by varying the particle size of the corn stover and process variables.The results showed that the Na_(2)CO_(3) dosage and pretreatment temperature had a strong influence on lignin removal,whereas pulp refining instrument(PFI)refining and Na_(2)CO_(3) dosage played positive roles in the final total sugar yield.Furthermore,it was found that pretreatment conditions had a more significant impact on the amelioration of pretreatment effectiveness compared with the properties of raw corn stover.In addition,a prediction of the effectiveness of the corn stover HSC pretreatment based on a PLS analysis was conducted for the first time,and the test results of the predictability based on additional pretreatment experiments proved that the developed PLS model achieved a good predictive performance(particularly for the final total sugar yield),indicating that the developed PLS model can be used to predict the effectiveness of HSC pretreatment.Therefore,multivariate analysis can be potentially used to monitor and control the pretreatment process in future large-scale biorefinery applications.
基金Ministero dell'Istruzione,dell'Universitàe della Ricerca(MIUR),Italy,for the financial support to this work,grant 60%2014
文摘The performance of different chemometric approaches was evaluated in the spectrophotometric determination of pharmaceutical mixtures characterized by having the amount of components with a very high ratio. Principal component regression (PCR), partial least squares with one dependent variable (PLS1) or multi-dependent variables (PLS2), and multivariate curve resolution (MCR) were applied to the spectral data of a ternary mixture containing paracetamol, sodium ascorbate and chlorpheniramine (150:140:1, m/m/m), and a quaternary mixture containing paracetamol, caffeine, phenylephrine and chlorpheniramine (125:6. 25:1.25:1, m/m/m/m). The UV spectra of the calibration samples in the range of 200-320 nm were pre-treated by removing noise and useless data, and the wavelength regions having the most useful analytical information were selected using the regression coefficients calculated in the multivariate modeling. All the defined chemometric models were validated on external sample sets and then applied to commercial pharmaceutical formulations. Different data intervals, fixed at 0.5, 1.0, and 2.0 point/nm, were tested to optimize the prediction ability of the models. The best results were obtained using the PLSlcalibration models and the quantification of the species of a lower amount was sig- nificantly improved by adopting 0.5 data interval, which showed accuracy between 94.24% and 107.76%.
文摘Near-infrared (NIR) spectroscopy combined with chemometrics methods was applied to the rapid and reagent-free analysis of serum urea nitrogen (SUN). The mul-partitions modeling was performed to achieve parameter stability. A large-scale parameter cyclic and global optimization platform for Norris derivative filter (NDF) of three parameters (the derivative order: d, the number of smoothing points: s and the number of differential gaps: g) was developed with PLS regression. Meantime, the parameters’ adaptive analysis of NDF algorithm was also given, and achieved a significantly better modeling effect than one without spectral pre-processing. After eliminating the interference wavebands of saturated absorption, the modeling performance was further improved. In validation, the root mean square error (SEP), correlation coefficient (RP) for prediction and the ratio of performance to deviation (RPD) were 1.66 mmol?L-1, 0.966 and 4.7, respectively. The results showed that the high-precision analysis of SUN was feasibility based on NIR spectroscopy and Norris-PLS. The global optimization method of NDF is also expected to be applied to other analysis objects.
基金Supported by the Key Projects in the National Science&Technology Pillar Program,China(No.2007BAI38B03)the Development Program of the Science and Technology of Jilin Province,China(Nos.200705C07,20075020)the 11th Five-Year Key Project of Jilin Province Education Department,China(No.[2010]205)
文摘The optimal selection method of spectral region based on the grey correlation analysis was applied in the analysis of near-infrared(NIR) spectra. In order to compute "characteristic" spectral region, 160 samples of tobacco were surveyed by NIR. Next, the whole spectral region was randomly divided into six regions, and the values of association coefficients and correlation orders of different regions were computed for total sugar, reducing sugar and nicotine. Moreover, two regions that owned the largest value of association coefficient were regarded as "characteristic" spectral region of a model. Finally, the quantitative analysis models of different components were established via the partial least squares method, and the common selection methods of spectral region were compared. The simulation results indicate that the models to choose the spectral region based on grey correlation analysis are more effective than the common selection methods of spectral region, the optimized time of algorithm is shorter, the prediction precision of the models is higher and generalization ability for quantitative analysis results is stronger. This research can provide the support for the quantitative analysis models of NIR spectra and new idea for commercial analysis software of NIR. So, it has a high application value in the analysis of NIR spectra.
文摘In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous polyps from hyperplastic polyps for the purpose of classification and interpretation. The classification performances of the two functional models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The results indicated that classification abilities of FPCA and FPLS models outperformed those of the PCDA and PLSDA models by using a small number of functional basis components. With substantial reduction in model complexity and improvement of classification accuracy, it is particularly helpful for interpretation of the complex spectral features related to precancerous colon polyps.