Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi...Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperatur...Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.展开更多
The computer auxiliary partial least squares is introduced to simultaneously determine the contents of Deoxyschizandin, Schisandrin, r-Schisandrin in the extracted solution of wuweizi. Regression analysis of the exper...The computer auxiliary partial least squares is introduced to simultaneously determine the contents of Deoxyschizandin, Schisandrin, r-Schisandrin in the extracted solution of wuweizi. Regression analysis of the experimental results shows that the average recovery of each component is all in the range from 98.9% to 110.3% , which means the partial least squares regression spectrophotometry can circumvent the overlappirtg of absorption spectrums of mlulti-components, so that sctisfactory results can be obtained without any scrapple pre-separation.展开更多
The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarde...The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
During the course of calculating the rice evapotranspiration using weather factors,we often find that some independent variables have multiple correlation.The phenomena can lead to the traditional multivariate regress...During the course of calculating the rice evapotranspiration using weather factors,we often find that some independent variables have multiple correlation.The phenomena can lead to the traditional multivariate regression model which based on least square method distortion.And the stability of the model will be lost.The model will be built based on partial least square regression in the paper,through applying the idea of main component analyze and typical correlation analyze,the writer picks up some component from original material.Thus,the writer builds up the model of rice evapotranspiration to solve the multiple correlation among the independent variables (some weather factors).At last,the writer analyses the model in some parts,and gains the satisfied result.展开更多
Partial least squares(PLS) regression is an important linear regression method that efficiently addresses the multiple correlation problem by combining principal component analysis and multiple regression. In this pap...Partial least squares(PLS) regression is an important linear regression method that efficiently addresses the multiple correlation problem by combining principal component analysis and multiple regression. In this paper, we present a quantum partial least squares(QPLS) regression algorithm. To solve the high time complexity of the PLS regression, we design a quantum eigenvector search method to speed up principal components and regression parameters construction. Meanwhile, we give a density matrix product method to avoid multiple access to quantum random access memory(QRAM)during building residual matrices. The time and space complexities of the QPLS regression are logarithmic in the independent variable dimension n, the dependent variable dimension w, and the number of variables m. This algorithm achieves exponential speed-ups over the PLS regression on n, m, and w. In addition, the QPLS regression inspires us to explore more potential quantum machine learning applications in future works.展开更多
Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map...Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map the input variables(input space) into a Reproducing Kernel Hilbert Space(so called feature space),where a linear CPR-PLS is constructed based on the projection of explanatory variables to latent variables(components). The linear CPR-PLS in the high-dimensional feature space corresponds to a nonlinear CPR-KPLS in the original input space. This method offers a novel extension for kernel partial least squares regression(KPLS),and some numerical simulation results are presented to illustrate the feasibility of the proposed method.展开更多
Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of hea...Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.展开更多
Boosting algorithms are a class of general methods used to improve the general periormance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution...Boosting algorithms are a class of general methods used to improve the general periormance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly, a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.展开更多
In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to...In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.展开更多
Boreal forests play an important role in global environment systems. Understanding boreal forest ecosystem structure and function requires accurate monitoring and estimating of forest canopy and biomass. We used parti...Boreal forests play an important role in global environment systems. Understanding boreal forest ecosystem structure and function requires accurate monitoring and estimating of forest canopy and biomass. We used partial least square regression (PLSR) models to relate forest parameters, i.e. canopy closure density and above ground tree biomass, to Landsat ETM+ data. The established models were optimized according to the variable importance for projection (VIP) criterion and the bootstrap method, and their performance was compared using several statistical indices. All variables selected by the VIP criterion passed the bootstrap test (p〈0.05). The simplified models without insignificant variables (VIP 〈1) performed as well as the full model but with less computation time. The relative root mean square error (RMSE%) was 29% for canopy closure density, and 58% for above ground tree biomass. We conclude that PLSR can be an effective method for estimating canopy closure density and above ground biomass.展开更多
Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global cir...Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global circulation model output (GCM). The objectives of this research were to determine the time lag of GCM data and build SD model using PCR method with time lag of the GCM precipitation data. The observations of rainfall data in Indramayu were taken from 1979 to 2007 showing similar patterns with GCM data on 1st grid to 64th grid after time shift (time lag). The time lag was determined using the cross-correlation function. However, GCM data of 64 grids showed multicollinearity problem. This problem was solved by principal component regression (PCR), but the PCR model resulted heterogeneous errors. PCR model was modified to overcome the errors with adding dummy variables to the model. Dummy variables were determined based on partial least squares regression (PLSR). The PCR model with dummy variables improved the rainfall prediction. The SD model with lag-GCM predictors was also better than SD model without lag-GCM.展开更多
Near infrared (NIR) hyperspectral imaging measurement of sugar content in peach was introauced. NIR spectral images (650~1 000 nm, resolution: 2 nm) of peach samples were captured with developed hyperspectral im...Near infrared (NIR) hyperspectral imaging measurement of sugar content in peach was introauced. NIR spectral images (650~1 000 nm, resolution: 2 nm) of peach samples were captured with developed hyperspectral imaging setup. Partial least square (PLS) regression prediction model was developed to estimate the sugar content in peach; step-wise backward method was utilized to determine optimal wavelength subsets. Experimental results show that the calibration model with optimal wavelength subsets has a correlation coefficient of prediction of 0.97 and a standard error of prediction of 0.19, the prediction accuracy is higher than the calibration model applied over the whole wavelength, which proves that variable selection plays an important role in improving the prediction accuracy of PLS regression model.展开更多
Accurate load prediction plays an important role in smart power management system, either for planning, facing the increasing of load demand, maintenance issues, or power distribution system. In order to achieve a rea...Accurate load prediction plays an important role in smart power management system, either for planning, facing the increasing of load demand, maintenance issues, or power distribution system. In order to achieve a reasonable prediction, authors have applied and compared two features extraction technique presented by kernel partial least square regression and kernel principal component regression, and both of them are carried out by polynomial and Gaussian kernels to map the original features’ to high dimension features’ space, and then draw new predictor variables known as scores and loadings, while kernel principal component regression draws the predictor features to construct new predictor variables without any consideration to response vector. In contrast, kernel partial least square regression does take the response vector into consideration. Models are simulated by three different cities’ electric load data, which used historical load data in addition to weekends and holidays as common predictor features for all models. On the other hand temperature has been used for only one data as a comparative study to measure its effect. Models’ results evaluated by three statistic measurements, show that Gaussian Kernel Partial Least Square Regression offers the more powerful features and significantly can improve the load prediction performance than other presented models.展开更多
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金the National Natural Science Foundation of China (41171281, 40701120)the Beijing Nova Program, China (2008B33)
文摘Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.
文摘The computer auxiliary partial least squares is introduced to simultaneously determine the contents of Deoxyschizandin, Schisandrin, r-Schisandrin in the extracted solution of wuweizi. Regression analysis of the experimental results shows that the average recovery of each component is all in the range from 98.9% to 110.3% , which means the partial least squares regression spectrophotometry can circumvent the overlappirtg of absorption spectrums of mlulti-components, so that sctisfactory results can be obtained without any scrapple pre-separation.
基金Supported by National Natural Science Foundation of China (No.50478086)Tianjin Special Scientific Innovation Foundation (No.06FZZDSH00900)
文摘The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
文摘During the course of calculating the rice evapotranspiration using weather factors,we often find that some independent variables have multiple correlation.The phenomena can lead to the traditional multivariate regression model which based on least square method distortion.And the stability of the model will be lost.The model will be built based on partial least square regression in the paper,through applying the idea of main component analyze and typical correlation analyze,the writer picks up some component from original material.Thus,the writer builds up the model of rice evapotranspiration to solve the multiple correlation among the independent variables (some weather factors).At last,the writer analyses the model in some parts,and gains the satisfied result.
基金Project supported by the Fundamental Research Funds for the Central Universities, China (Grant No. 2019XD-A02)the National Natural Science Foundation of China (Grant Nos. U1636106, 61671087, 61170272, and 92046001)+2 种基金Natural Science Foundation of Beijing Municipality, China (Grant No. 4182006)Technological Special Project of Guizhou Province, China (Grant No. 20183001)the Foundation of Guizhou Provincial Key Laboratory of Public Big Data (Grant Nos. 2018BDKFJJ016 and 2018BDKFJJ018)。
文摘Partial least squares(PLS) regression is an important linear regression method that efficiently addresses the multiple correlation problem by combining principal component analysis and multiple regression. In this paper, we present a quantum partial least squares(QPLS) regression algorithm. To solve the high time complexity of the PLS regression, we design a quantum eigenvector search method to speed up principal components and regression parameters construction. Meanwhile, we give a density matrix product method to avoid multiple access to quantum random access memory(QRAM)during building residual matrices. The time and space complexities of the QPLS regression are logarithmic in the independent variable dimension n, the dependent variable dimension w, and the number of variables m. This algorithm achieves exponential speed-ups over the PLS regression on n, m, and w. In addition, the QPLS regression inspires us to explore more potential quantum machine learning applications in future works.
文摘Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map the input variables(input space) into a Reproducing Kernel Hilbert Space(so called feature space),where a linear CPR-PLS is constructed based on the projection of explanatory variables to latent variables(components). The linear CPR-PLS in the high-dimensional feature space corresponds to a nonlinear CPR-KPLS in the original input space. This method offers a novel extension for kernel partial least squares regression(KPLS),and some numerical simulation results are presented to illustrate the feasibility of the proposed method.
基金the Hi-Tech Research and Development Program (863) of China (No. 2006AA10Z203)the National Scienceand Technology Task Force Project (No. 2006BAD10A01), China
文摘Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.
基金This work was supported by the National High-tech Research and Development Program of China (No. 2003AA412110).
文摘Boosting algorithms are a class of general methods used to improve the general periormance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly, a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.
基金Supported by the National Social Science Foundation of China(Grant No.22BTJ059)。
文摘In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.
基金supported by the 948 Program of the State Forestry Administration (2009-4-43)the National Natura Science Foundation of China (No.30870420)
文摘Boreal forests play an important role in global environment systems. Understanding boreal forest ecosystem structure and function requires accurate monitoring and estimating of forest canopy and biomass. We used partial least square regression (PLSR) models to relate forest parameters, i.e. canopy closure density and above ground tree biomass, to Landsat ETM+ data. The established models were optimized according to the variable importance for projection (VIP) criterion and the bootstrap method, and their performance was compared using several statistical indices. All variables selected by the VIP criterion passed the bootstrap test (p〈0.05). The simplified models without insignificant variables (VIP 〈1) performed as well as the full model but with less computation time. The relative root mean square error (RMSE%) was 29% for canopy closure density, and 58% for above ground tree biomass. We conclude that PLSR can be an effective method for estimating canopy closure density and above ground biomass.
文摘Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global circulation model output (GCM). The objectives of this research were to determine the time lag of GCM data and build SD model using PCR method with time lag of the GCM precipitation data. The observations of rainfall data in Indramayu were taken from 1979 to 2007 showing similar patterns with GCM data on 1st grid to 64th grid after time shift (time lag). The time lag was determined using the cross-correlation function. However, GCM data of 64 grids showed multicollinearity problem. This problem was solved by principal component regression (PCR), but the PCR model resulted heterogeneous errors. PCR model was modified to overcome the errors with adding dummy variables to the model. Dummy variables were determined based on partial least squares regression (PLSR). The PCR model with dummy variables improved the rainfall prediction. The SD model with lag-GCM predictors was also better than SD model without lag-GCM.
文摘Near infrared (NIR) hyperspectral imaging measurement of sugar content in peach was introauced. NIR spectral images (650~1 000 nm, resolution: 2 nm) of peach samples were captured with developed hyperspectral imaging setup. Partial least square (PLS) regression prediction model was developed to estimate the sugar content in peach; step-wise backward method was utilized to determine optimal wavelength subsets. Experimental results show that the calibration model with optimal wavelength subsets has a correlation coefficient of prediction of 0.97 and a standard error of prediction of 0.19, the prediction accuracy is higher than the calibration model applied over the whole wavelength, which proves that variable selection plays an important role in improving the prediction accuracy of PLS regression model.
文摘Accurate load prediction plays an important role in smart power management system, either for planning, facing the increasing of load demand, maintenance issues, or power distribution system. In order to achieve a reasonable prediction, authors have applied and compared two features extraction technique presented by kernel partial least square regression and kernel principal component regression, and both of them are carried out by polynomial and Gaussian kernels to map the original features’ to high dimension features’ space, and then draw new predictor variables known as scores and loadings, while kernel principal component regression draws the predictor features to construct new predictor variables without any consideration to response vector. In contrast, kernel partial least square regression does take the response vector into consideration. Models are simulated by three different cities’ electric load data, which used historical load data in addition to weekends and holidays as common predictor features for all models. On the other hand temperature has been used for only one data as a comparative study to measure its effect. Models’ results evaluated by three statistic measurements, show that Gaussian Kernel Partial Least Square Regression offers the more powerful features and significantly can improve the load prediction performance than other presented models.