The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by tradit...The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.展开更多
Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi...Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a...Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.展开更多
Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of hea...Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.展开更多
Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect ...Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect the disease damage of powdery mildew on leaf level by means of the hyperspectral measurements, particularly using the continuous wavelet analysis. In May 2010, the reflectance spectra and the biochemical properties were measured for 114 leaf samples with various disease severity degrees. A hyperspectral imaging system was also employed for obtaining detailed hyperspectral information of the normal and the pustule areas within one diseased leaf. Based on these spectra data, a continuous wavelet analysis (CWA) was carried out in conjunction with a correlation analysis, which generated a so-called correlation scalogram that summarizes the correlations between disease severity and the wavelet power at different wavelengths and decomposition scales. By using a thresholding approach, seven wavelet features were isolated for developing models in determining disease severity. In addition, 22 conventional spectral features (SFs) were also tested and compared with wavelet features for their efficiency in estimating disease severity. The multivariate linear regression (MLR) analysis and the partial least square regression (PLSR) analysis were adopted as training methods in model mildew on leaf level were found to be closely related with the development. The spectral characteristics of the powdery spectral characteristics of the pustule area and the content of chlorophyll. The wavelet features performed better than the conventional SFs in capturing this spectral change. Moreover, the regression model composed by seven wavelet features outperformed (R2=0.77, relative root mean square error RRMSE=0.28) the model composed by 14 optimal conventional SFs (R2---0.69, RRMSE--0.32) in estimating the disease severity. The PLSR method yielded a higher accuracy than the MLR method. A combination of CWA and PLSR was found to be promising in providing relatively accurate estimates of disease severity of powdery mildew on leaf level.展开更多
China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteo...China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteorological conditions, soil types, nutrient content of soil, and management practices. Meteorological factors, such as light, temperature and moisture are key environmental conditions affecting apple quality that are difficult to regulate and control. This study was performed to determine the effect of meteorological factors on the qualities of Fuji apple and to provide evidence for a reasonable regional layout and planting of Fuji apple in China. Fruit samples of Fuji apple and meteorological data were investigated from 153 commercial Fuji apple orchards located in 51 counties of 11 regions in China from 2010 to 2011. Partial least-squares regression and linear programming were used to analyze the effect model and impact weight of meteorological factors on fruit quality, to determine the major meteorological factors influencing fruit quality attributes, and to establish a regression equation to optimize meteorological factors for high-quality Fuji apples. Results showed relationships between fruit quality attributes and meteorological factors among the various apple producing counties in China. The mean, minimum, and maximum temperatures from April to October had the highest positive effects on fruit qualities in model effect loadings and weights, followed by the mean annual temperature and the sunshine percentage, the temperature difference between day and night, and the total precipitation for the same period. In contrast, annual total precipitation and relative humidity from April to October had negative effects on fruit quality. The meteorological factors exhibited distinct effects on the different fruit quality attributes. Soluble solid content was affected from the high to the low row preface by annual total precipitation, the minimum temperature from April to October, the mean temperature from April to October, the temperature difference between day and night, and the mean annual temperature. The regression equation showed that the optimum meteorological factors on fruit quality were the mean annual temperature of 5.5-18°C and the annual total precipitation of 602-1121 mm for the whole year, and the mean temperature of 13.3-19.6°C, the minimum temperature of 7.8-18.5°C, the maximum temperature of 19.5°C, the temperature difference of 13.7°C between day and night, the total precipitation of 227 mm, the relative humidity of 57.5-84.0%, and the sunshine percentage of 36.5-70.0% during the growing period (from April to October).展开更多
A rapid quantitative analytical method for three components of Lonicerae Japornicae Flos solution(Lonicera Japonica Thumb.)extracted by water was developed using near-infrared(NIR)spectroscopy and the partial least-sq...A rapid quantitative analytical method for three components of Lonicerae Japornicae Flos solution(Lonicera Japonica Thumb.)extracted by water was developed using near-infrared(NIR)spectroscopy and the partial least-squares(PLS)method.The NIR spectra of 81 samples collected from a production line were obtained.The concentrations of secologanic acid,chlorogenicacid and galuteolin were detemmined by using high-performance liquid chromatography-diodearray detection as the reference method.Several pretreatment methods for the NIR spectra wereusedi during PLS calibration.The most appropriate latent variable number of the PLS factor wasselected based on the standard error of cross-validation(SECV).The performance of the finalPLS models was evaluated according to SECV,standard error of predliction(SEP)and deter-mination coeficient(R^(2)).The compounds secologanic acid,chlorogenic acid and galuteolin hadSEP values of 0.030,0.061 and 1.668μg/mL,respectively and R^(2) values over 0.85.This workshows that NIR spectroscopy is a rapid and convenient method for the analysis of LoniceraeJaponicae Flos solution extracted by water.The proposed method can help in the application ofprocs analytical technology in the pha maceutical industry,particularly in tra ditional Chinesemedicine injections.展开更多
To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to...To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.展开更多
In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous p...In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous polyps from hyperplastic polyps for the purpose of classification and interpretation. The classification performances of the two functional models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The results indicated that classification abilities of FPCA and FPLS models outperformed those of the PCDA and PLSDA models by using a small number of functional basis components. With substantial reduction in model complexity and improvement of classification accuracy, it is particularly helpful for interpretation of the complex spectral features related to precancerous colon polyps.展开更多
This study evaluates the operational performance of all routes of Sajha Bus Yatayat operating inside Kathmandu valley using Data Envelopment Analysis (DEA) in terms of efficiency and effectiveness score. This approach...This study evaluates the operational performance of all routes of Sajha Bus Yatayat operating inside Kathmandu valley using Data Envelopment Analysis (DEA) in terms of efficiency and effectiveness score. This approach allows us to access the relative performance of transit system in absence of historical data and research to compare with. To explore the possibility of enhancing the performance, scenarios were created for relatively underperforming routes and long route problem by changing the most important input variable and output variables accordingly with regression model where it was relevant. Partial Least Squares (PLS) regression was used to determine the most influential input variables to the output variables. DEA was conducted to access the performance of all routes under these scenarios. Underperforming routes except the longest route under the first set of scenarios, emerge to be better performing efficiently without considerable negative deviation in effectiveness. The result of second set of scenarios for long route problem suggests that the longest route’s performance can be enhanced significantly upon proper route alignment. Scenarios development and evaluation can help lead transit companies to explore the strategies to facilitate operational performance enhancement.展开更多
The objective of this paper is to present a review of different calibration and classification methods for functional data in the context of chemometric applications. In chemometric, it is usual to measure certain par...The objective of this paper is to present a review of different calibration and classification methods for functional data in the context of chemometric applications. In chemometric, it is usual to measure certain parameters in terms of a set of spectrometric curves that are observed in a finite set of points (functional data). Although the predictor variable is clearly functional, this problem is usually solved by using multivariate calibration techniques that consider it as a finite set of variables associated with the observed points (wavelengths or times). But these explicative variables are highly correlated and it is therefore more informative to reconstruct first the true functional form of the predictor curves. Although it has been published in several articles related to the implementation of functional data analysis techniques in chemometric, their power to solve real problems is not yet well known. Because of this the extension of multivariate calibration techniques (linear regression, principal component regression and partial least squares) and classification methods (linear discriminant analysis and logistic regression) to the functional domain and some relevant chemometric applications are reviewed in this paper.展开更多
Rapid and sensitive recognition of herbal pieces according to different concocted processing is crucial to quality control and pharmaceutical effect. Near-infrared (NIR) and mid-infrared (MIR) technology combined ...Rapid and sensitive recognition of herbal pieces according to different concocted processing is crucial to quality control and pharmaceutical effect. Near-infrared (NIR) and mid-infrared (MIR) technology combined with supervised pattern recognition based on partial least-squares discriminant analysis (PLSDA) was attempted to classify and recognize six different concocted processing pieces of 600 Areca catechu L. samples and the influence of fingerprint information preprocessing methods on recognition performance was also investigated in this work. Recognition rates of 99.24%, 100% and 99.49% for original fingerprint, multiple scatter correct (MSC) fingerprint and second derivative (2nd derivative) fingerprint of NIR spectra were achieved by PLSDA models, respectively. Meanwhile, a perfect recognition rate of 100% was obtained for the above three fingerprint models of MIR spectra. In conclusion, PLSDA can rapidly and effectively extract otherness of fingerprint information from NIR and MIR spectra to identify different concocted herbal pieces ofA. catechu.展开更多
This thesis offers the general concept of coefficient of partial correlation.Starting with regres-sion analysis,the paper,by using samples,infers the general formula of expressing coefficient of partial correlation by...This thesis offers the general concept of coefficient of partial correlation.Starting with regres-sion analysis,the paper,by using samples,infers the general formula of expressing coefficient of partial correlation by way of simple correlation coefficient.展开更多
We consider a functional partially linear additive model that predicts a functional response by a scalar predictor and functional predictors. The B-spline and eigenbasis least squares estimator for both the parametric...We consider a functional partially linear additive model that predicts a functional response by a scalar predictor and functional predictors. The B-spline and eigenbasis least squares estimator for both the parametric and the nonparametric components proposed. In the final of this paper, as a result, we got the variance decomposition of the model and establish the asymptotic convergence rate for estimator.展开更多
文摘The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
基金financial supports from National Natural Science Foundation of China(No.62205172)Huaneng Group Science and Technology Research Project(No.HNKJ22-H105)Tsinghua University Initiative Scientific Research Program and the International Joint Mission on Climate Change and Carbon Neutrality。
文摘Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.
基金the Hi-Tech Research and Development Program (863) of China (No. 2006AA10Z203)the National Scienceand Technology Task Force Project (No. 2006BAD10A01), China
文摘Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.
基金the National Natural Science Foundation of China (41101395, 41071276, 31071324)the Beijing Municipal Natural Science Foundation, China (4122032)the National Basic Research Program of China (2011CB311806)
文摘Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect the disease damage of powdery mildew on leaf level by means of the hyperspectral measurements, particularly using the continuous wavelet analysis. In May 2010, the reflectance spectra and the biochemical properties were measured for 114 leaf samples with various disease severity degrees. A hyperspectral imaging system was also employed for obtaining detailed hyperspectral information of the normal and the pustule areas within one diseased leaf. Based on these spectra data, a continuous wavelet analysis (CWA) was carried out in conjunction with a correlation analysis, which generated a so-called correlation scalogram that summarizes the correlations between disease severity and the wavelet power at different wavelengths and decomposition scales. By using a thresholding approach, seven wavelet features were isolated for developing models in determining disease severity. In addition, 22 conventional spectral features (SFs) were also tested and compared with wavelet features for their efficiency in estimating disease severity. The multivariate linear regression (MLR) analysis and the partial least square regression (PLSR) analysis were adopted as training methods in model mildew on leaf level were found to be closely related with the development. The spectral characteristics of the powdery spectral characteristics of the pustule area and the content of chlorophyll. The wavelet features performed better than the conventional SFs in capturing this spectral change. Moreover, the regression model composed by seven wavelet features outperformed (R2=0.77, relative root mean square error RRMSE=0.28) the model composed by 14 optimal conventional SFs (R2---0.69, RRMSE--0.32) in estimating the disease severity. The PLSR method yielded a higher accuracy than the MLR method. A combination of CWA and PLSR was found to be promising in providing relatively accurate estimates of disease severity of powdery mildew on leaf level.
基金supported by the Forest Scientific Research in the Public Interest,China(201404720)the earmarked fund for the China Agriculture Research System(CARS-27)the Beijing Municipal Education Commission,China(CEFF-PXM2017_014207_000043)
文摘China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteorological conditions, soil types, nutrient content of soil, and management practices. Meteorological factors, such as light, temperature and moisture are key environmental conditions affecting apple quality that are difficult to regulate and control. This study was performed to determine the effect of meteorological factors on the qualities of Fuji apple and to provide evidence for a reasonable regional layout and planting of Fuji apple in China. Fruit samples of Fuji apple and meteorological data were investigated from 153 commercial Fuji apple orchards located in 51 counties of 11 regions in China from 2010 to 2011. Partial least-squares regression and linear programming were used to analyze the effect model and impact weight of meteorological factors on fruit quality, to determine the major meteorological factors influencing fruit quality attributes, and to establish a regression equation to optimize meteorological factors for high-quality Fuji apples. Results showed relationships between fruit quality attributes and meteorological factors among the various apple producing counties in China. The mean, minimum, and maximum temperatures from April to October had the highest positive effects on fruit qualities in model effect loadings and weights, followed by the mean annual temperature and the sunshine percentage, the temperature difference between day and night, and the total precipitation for the same period. In contrast, annual total precipitation and relative humidity from April to October had negative effects on fruit quality. The meteorological factors exhibited distinct effects on the different fruit quality attributes. Soluble solid content was affected from the high to the low row preface by annual total precipitation, the minimum temperature from April to October, the mean temperature from April to October, the temperature difference between day and night, and the mean annual temperature. The regression equation showed that the optimum meteorological factors on fruit quality were the mean annual temperature of 5.5-18°C and the annual total precipitation of 602-1121 mm for the whole year, and the mean temperature of 13.3-19.6°C, the minimum temperature of 7.8-18.5°C, the maximum temperature of 19.5°C, the temperature difference of 13.7°C between day and night, the total precipitation of 227 mm, the relative humidity of 57.5-84.0%, and the sunshine percentage of 36.5-70.0% during the growing period (from April to October).
基金Financial support was received from the National High-tech Industry Development Project of National Development and Reform Commission(Nos.2007-2490).
文摘A rapid quantitative analytical method for three components of Lonicerae Japornicae Flos solution(Lonicera Japonica Thumb.)extracted by water was developed using near-infrared(NIR)spectroscopy and the partial least-squares(PLS)method.The NIR spectra of 81 samples collected from a production line were obtained.The concentrations of secologanic acid,chlorogenicacid and galuteolin were detemmined by using high-performance liquid chromatography-diodearray detection as the reference method.Several pretreatment methods for the NIR spectra wereusedi during PLS calibration.The most appropriate latent variable number of the PLS factor wasselected based on the standard error of cross-validation(SECV).The performance of the finalPLS models was evaluated according to SECV,standard error of predliction(SEP)and deter-mination coeficient(R^(2)).The compounds secologanic acid,chlorogenic acid and galuteolin hadSEP values of 0.030,0.061 and 1.668μg/mL,respectively and R^(2) values over 0.85.This workshows that NIR spectroscopy is a rapid and convenient method for the analysis of LoniceraeJaponicae Flos solution extracted by water.The proposed method can help in the application ofprocs analytical technology in the pha maceutical industry,particularly in tra ditional Chinesemedicine injections.
基金Funded by the Natural Basic Research Program of China under the grant No. 2005CB422207.
文摘To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.
文摘In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous polyps from hyperplastic polyps for the purpose of classification and interpretation. The classification performances of the two functional models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The results indicated that classification abilities of FPCA and FPLS models outperformed those of the PCDA and PLSDA models by using a small number of functional basis components. With substantial reduction in model complexity and improvement of classification accuracy, it is particularly helpful for interpretation of the complex spectral features related to precancerous colon polyps.
文摘This study evaluates the operational performance of all routes of Sajha Bus Yatayat operating inside Kathmandu valley using Data Envelopment Analysis (DEA) in terms of efficiency and effectiveness score. This approach allows us to access the relative performance of transit system in absence of historical data and research to compare with. To explore the possibility of enhancing the performance, scenarios were created for relatively underperforming routes and long route problem by changing the most important input variable and output variables accordingly with regression model where it was relevant. Partial Least Squares (PLS) regression was used to determine the most influential input variables to the output variables. DEA was conducted to access the performance of all routes under these scenarios. Underperforming routes except the longest route under the first set of scenarios, emerge to be better performing efficiently without considerable negative deviation in effectiveness. The result of second set of scenarios for long route problem suggests that the longest route’s performance can be enhanced significantly upon proper route alignment. Scenarios development and evaluation can help lead transit companies to explore the strategies to facilitate operational performance enhancement.
文摘The objective of this paper is to present a review of different calibration and classification methods for functional data in the context of chemometric applications. In chemometric, it is usual to measure certain parameters in terms of a set of spectrometric curves that are observed in a finite set of points (functional data). Although the predictor variable is clearly functional, this problem is usually solved by using multivariate calibration techniques that consider it as a finite set of variables associated with the observed points (wavelengths or times). But these explicative variables are highly correlated and it is therefore more informative to reconstruct first the true functional form of the predictor curves. Although it has been published in several articles related to the implementation of functional data analysis techniques in chemometric, their power to solve real problems is not yet well known. Because of this the extension of multivariate calibration techniques (linear regression, principal component regression and partial least squares) and classification methods (linear discriminant analysis and logistic regression) to the functional domain and some relevant chemometric applications are reviewed in this paper.
基金supported by the National Natural Science Foundation of China(Nos.21205145,21276006,21036009)the Open Funds of State Key Laboratory of Chemo/Biosensing and Chemometrics of Hunan University(No.201111)+1 种基金the Special Fund for Basic Scientific Research of Central Colleges,South-Central University for Nationalities(Nos.CZZ10005 and CZQ11012)the 'Five-twelfth' National Science and Technology Support Program (No.2012BAI27B00)
文摘Rapid and sensitive recognition of herbal pieces according to different concocted processing is crucial to quality control and pharmaceutical effect. Near-infrared (NIR) and mid-infrared (MIR) technology combined with supervised pattern recognition based on partial least-squares discriminant analysis (PLSDA) was attempted to classify and recognize six different concocted processing pieces of 600 Areca catechu L. samples and the influence of fingerprint information preprocessing methods on recognition performance was also investigated in this work. Recognition rates of 99.24%, 100% and 99.49% for original fingerprint, multiple scatter correct (MSC) fingerprint and second derivative (2nd derivative) fingerprint of NIR spectra were achieved by PLSDA models, respectively. Meanwhile, a perfect recognition rate of 100% was obtained for the above three fingerprint models of MIR spectra. In conclusion, PLSDA can rapidly and effectively extract otherness of fingerprint information from NIR and MIR spectra to identify different concocted herbal pieces ofA. catechu.
文摘This thesis offers the general concept of coefficient of partial correlation.Starting with regres-sion analysis,the paper,by using samples,infers the general formula of expressing coefficient of partial correlation by way of simple correlation coefficient.
文摘We consider a functional partially linear additive model that predicts a functional response by a scalar predictor and functional predictors. The B-spline and eigenbasis least squares estimator for both the parametric and the nonparametric components proposed. In the final of this paper, as a result, we got the variance decomposition of the model and establish the asymptotic convergence rate for estimator.