Analysis of stock recruitment (SR) data is most often done by fitting various SR relationship curves to the data. Fish population dynamics data often have stochastic variations and measurement errors, which usually re...Analysis of stock recruitment (SR) data is most often done by fitting various SR relationship curves to the data. Fish population dynamics data often have stochastic variations and measurement errors, which usually result in a biased regression analysis. This paper presents a robust regression method, least median of squared orthogonal distance (LMD), which is insensitive to abnormal values in the dependent and independent variables in a regression analysis. Outliers that have significantly different variance from the rest of the data can be identified in a residual analysis. Then, the least squares (LS) method is applied to the SR data with defined outliers being down weighted. The application of LMD and LMD based Reweighted Least Squares (RLS) method to simulated and real fisheries SR data is explored.展开更多
During the course of calculating the rice evapotranspiration using weather factors,we often find that some independent variables have multiple correlation.The phenomena can lead to the traditional multivariate regress...During the course of calculating the rice evapotranspiration using weather factors,we often find that some independent variables have multiple correlation.The phenomena can lead to the traditional multivariate regression model which based on least square method distortion.And the stability of the model will be lost.The model will be built based on partial least square regression in the paper,through applying the idea of main component analyze and typical correlation analyze,the writer picks up some component from original material.Thus,the writer builds up the model of rice evapotranspiration to solve the multiple correlation among the independent variables (some weather factors).At last,the writer analyses the model in some parts,and gains the satisfied result.展开更多
Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi...Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
A method of multiple outputs least squares support vector regression (LS-SVR) was developed and described in detail, with the radial basis function (RBF) as the kernel function. The method was applied to predict t...A method of multiple outputs least squares support vector regression (LS-SVR) was developed and described in detail, with the radial basis function (RBF) as the kernel function. The method was applied to predict the future state of the power-shift steering transmission (PSST). A prediction model of PSST was gotten with multiple outputs LS-SVR. The model performance was greatly influenced by the penalty parameter γ and kernel parameter σ2 which were optimized using cross validation method. The training and prediction of the model were done with spectrometric oil analysis data. The predictive and actual values were compared and a fault in the second PSST was found. The research proved that this method had good accuracy in PSST fault prediction, and any possible problem in PSST could be found through a comparative analysis.展开更多
The computer auxiliary partial least squares is introduced to simultaneously determine the contents of Deoxyschizandin, Schisandrin, r-Schisandrin in the extracted solution of wuweizi. Regression analysis of the exper...The computer auxiliary partial least squares is introduced to simultaneously determine the contents of Deoxyschizandin, Schisandrin, r-Schisandrin in the extracted solution of wuweizi. Regression analysis of the experimental results shows that the average recovery of each component is all in the range from 98.9% to 110.3% , which means the partial least squares regression spectrophotometry can circumvent the overlappirtg of absorption spectrums of mlulti-components, so that sctisfactory results can be obtained without any scrapple pre-separation.展开更多
Based on the surveying data of strata-moving angle and the ordinary least squares regression, this paper is to construct, a regression model is constructed which is strata-moving parameter β concerning the coal bed o...Based on the surveying data of strata-moving angle and the ordinary least squares regression, this paper is to construct, a regression model is constructed which is strata-moving parameter β concerning the coal bed obliquity, coal thickness, mining depth, etc. But the regression is unsuccessful. The result is that none of the parameters is suited, this is not up to objective reality. This paper presents a novel method, partial least squares regression (PLS regression), to construct the statistic model of strata-moving parameter β. The experiment shows that the forecasting model is reasonable.展开更多
In classical regression analysis, the error of independent variable is usually not taken into account in regression analysis. This paper presents two solution methods for the case that both the independent and the dep...In classical regression analysis, the error of independent variable is usually not taken into account in regression analysis. This paper presents two solution methods for the case that both the independent and the dependent variables have errors. These methods are derived from the condition-adjustment and indirect-adjustment models based on the Total-Least-Squares principle. The equivalence of these two methods is also proven in theory.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map...Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map the input variables(input space) into a Reproducing Kernel Hilbert Space(so called feature space),where a linear CPR-PLS is constructed based on the projection of explanatory variables to latent variables(components). The linear CPR-PLS in the high-dimensional feature space corresponds to a nonlinear CPR-KPLS in the original input space. This method offers a novel extension for kernel partial least squares regression(KPLS),and some numerical simulation results are presented to illustrate the feasibility of the proposed method.展开更多
In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived ...In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.展开更多
In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calcula...In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.展开更多
Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperatur...Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.展开更多
As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorit...As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorithm for LS-SVRM are that the training speed is slow, and the generalization performance is not satis- factory, especially for large scale problems. Hence an improved algorithm is proposed. In order to accelerate the training speed, the pruned data point and fast leave-one-out error are employed to validate the temporary model obtained after decremental learning. The novel objective function in the termination condition which in- volves the whole constraints generated by all training data points and three pruning strategies are employed to improve the generali- zation performance. The effectiveness of the proposed algorithm is tested on six benchmark datasets. The sparse LS-SVRM model has a faster training speed and better generalization performance.展开更多
To overcome the disadvantage that the standard least squares support vector regression(LS-SVR) algorithm is not suitable to multiple-input multiple-output(MIMO) system modelling directly,an improved LS-SVR algorithm w...To overcome the disadvantage that the standard least squares support vector regression(LS-SVR) algorithm is not suitable to multiple-input multiple-output(MIMO) system modelling directly,an improved LS-SVR algorithm which was defined as multi-output least squares support vector regression(MLSSVR) was put forward by adding samples' absolute errors in objective function and applied to flatness intelligent control.To solve the poor-precision problem of the control scheme based on effective matrix in flatness control,the predictive control was introduced into the control system and the effective matrix-predictive flatness control method was proposed by combining the merits of the two methods.Simulation experiment was conducted on 900HC reversible cold roll.The performance of effective matrix method and the effective matrix-predictive control method were compared,and the results demonstrate the validity of the effective matrix-predictive control method.展开更多
In reliability analysis,the stress-strength model is often used to describe the life of a component which has a random strength(X)and is subjected to a random stress(Y).In this paper,we considered the problem of estim...In reliability analysis,the stress-strength model is often used to describe the life of a component which has a random strength(X)and is subjected to a random stress(Y).In this paper,we considered the problem of estimating the reliability𝑅𝑅=P[Y<X]when the distributions of both stress and strength are independent and follow exponentiated Pareto distribution.The maximum likelihood estimator of the stress strength reliability is calculated under simple random sample,ranked set sampling and median ranked set sampling methods.Four different reliability estimators under median ranked set sampling are derived.Two estimators are obtained when both strength and stress have an odd or an even set size.The two other estimators are obtained when the strength has an odd size and the stress has an even set size and vice versa.The performances of the suggested estimators are compared with their competitors under simple random sample via a simulation study.The simulation study revealed that the stress strength reliability estimates based on ranked set sampling and median ranked set sampling are more efficient than their competitors via simple random sample.In general,the stress strength reliability estimates based on median ranked set sampling are smaller than the corresponding estimates under ranked set sampling and simple random sample methods.Keywords:Stress-Strength model,ranked set sampling,median ranked set sampling,maximum likelihood estimation,mean square error.corresponding estimates under ranked set sampling and simple random sample methods.展开更多
In this paper, the performance of existing biased estimators (Ridge Estimator (RE), Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Esti...In this paper, the performance of existing biased estimators (Ridge Estimator (RE), Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator) and the respective predictors were considered in a misspecified linear regression model when there exists multicollinearity among explanatory variables. A generalized form was used to compare these estimators and predictors in the mean square error sense. Further, theoretical findings were established using mean square error matrix and scalar mean square error. Finally, a numerical example and a Monte Carlo simulation study were done to illustrate the theoretical findings. The simulation study revealed that LE and RE outperform the other estimators when weak multicollinearity exists, and RE, r-k class and r-d class estimators outperform the other estimators when moderated and high multicollinearity exist for certain values of shrinkage parameters, respectively. The predictors based on the LE and RE are always superior to the other predictors for certain values of shrinkage parameters.展开更多
The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarde...The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Recursive algorithms are very useful for computing M-estimators of regression coefficients and scatter parameters. In this article, it is shown that for a nondecreasing ul (t), under some mild conditions the recursi...Recursive algorithms are very useful for computing M-estimators of regression coefficients and scatter parameters. In this article, it is shown that for a nondecreasing ul (t), under some mild conditions the recursive M-estimators of regression coefficients and scatter parameters are strongly consistent and the recursive M-estimator of the regression coefficients is also asymptotically normal distributed. Furthermore, optimal recursive M-estimators, asymptotic efficiencies of recursive M-estimators and asymptotic relative efficiencies between recursive M-estimators of regression coefficients are studied.展开更多
文摘Analysis of stock recruitment (SR) data is most often done by fitting various SR relationship curves to the data. Fish population dynamics data often have stochastic variations and measurement errors, which usually result in a biased regression analysis. This paper presents a robust regression method, least median of squared orthogonal distance (LMD), which is insensitive to abnormal values in the dependent and independent variables in a regression analysis. Outliers that have significantly different variance from the rest of the data can be identified in a residual analysis. Then, the least squares (LS) method is applied to the SR data with defined outliers being down weighted. The application of LMD and LMD based Reweighted Least Squares (RLS) method to simulated and real fisheries SR data is explored.
文摘During the course of calculating the rice evapotranspiration using weather factors,we often find that some independent variables have multiple correlation.The phenomena can lead to the traditional multivariate regression model which based on least square method distortion.And the stability of the model will be lost.The model will be built based on partial least square regression in the paper,through applying the idea of main component analyze and typical correlation analyze,the writer picks up some component from original material.Thus,the writer builds up the model of rice evapotranspiration to solve the multiple correlation among the independent variables (some weather factors).At last,the writer analyses the model in some parts,and gains the satisfied result.
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金Supported by the Ministerial Level Advanced Research Foundation(3031030)the"111"Project(B08043)
文摘A method of multiple outputs least squares support vector regression (LS-SVR) was developed and described in detail, with the radial basis function (RBF) as the kernel function. The method was applied to predict the future state of the power-shift steering transmission (PSST). A prediction model of PSST was gotten with multiple outputs LS-SVR. The model performance was greatly influenced by the penalty parameter γ and kernel parameter σ2 which were optimized using cross validation method. The training and prediction of the model were done with spectrometric oil analysis data. The predictive and actual values were compared and a fault in the second PSST was found. The research proved that this method had good accuracy in PSST fault prediction, and any possible problem in PSST could be found through a comparative analysis.
文摘The computer auxiliary partial least squares is introduced to simultaneously determine the contents of Deoxyschizandin, Schisandrin, r-Schisandrin in the extracted solution of wuweizi. Regression analysis of the experimental results shows that the average recovery of each component is all in the range from 98.9% to 110.3% , which means the partial least squares regression spectrophotometry can circumvent the overlappirtg of absorption spectrums of mlulti-components, so that sctisfactory results can be obtained without any scrapple pre-separation.
基金Project(030501801) supported by the Key Laboratory of the State Bureau of Surveying and Mapping in Geographical Space InformationEngineering
文摘Based on the surveying data of strata-moving angle and the ordinary least squares regression, this paper is to construct, a regression model is constructed which is strata-moving parameter β concerning the coal bed obliquity, coal thickness, mining depth, etc. But the regression is unsuccessful. The result is that none of the parameters is suited, this is not up to objective reality. This paper presents a novel method, partial least squares regression (PLS regression), to construct the statistic model of strata-moving parameter β. The experiment shows that the forecasting model is reasonable.
基金supported by the National Nature Science Foundation of China (41174009)
文摘In classical regression analysis, the error of independent variable is usually not taken into account in regression analysis. This paper presents two solution methods for the case that both the independent and the dependent variables have errors. These methods are derived from the condition-adjustment and indirect-adjustment models based on the Total-Least-Squares principle. The equivalence of these two methods is also proven in theory.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
文摘Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map the input variables(input space) into a Reproducing Kernel Hilbert Space(so called feature space),where a linear CPR-PLS is constructed based on the projection of explanatory variables to latent variables(components). The linear CPR-PLS in the high-dimensional feature space corresponds to a nonlinear CPR-KPLS in the original input space. This method offers a novel extension for kernel partial least squares regression(KPLS),and some numerical simulation results are presented to illustrate the feasibility of the proposed method.
文摘In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.
基金Supported by the Natural Science Foundation of Anhui Education Committee
文摘In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.
基金the National Natural Science Foundation of China (41171281, 40701120)the Beijing Nova Program, China (2008B33)
文摘Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.
基金supported by the National Natural Science Foundation of China (61074127)
文摘As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorithm for LS-SVRM are that the training speed is slow, and the generalization performance is not satis- factory, especially for large scale problems. Hence an improved algorithm is proposed. In order to accelerate the training speed, the pruned data point and fast leave-one-out error are employed to validate the temporary model obtained after decremental learning. The novel objective function in the termination condition which in- volves the whole constraints generated by all training data points and three pruning strategies are employed to improve the generali- zation performance. The effectiveness of the proposed algorithm is tested on six benchmark datasets. The sparse LS-SVRM model has a faster training speed and better generalization performance.
基金Project(50675186) supported by the National Natural Science Foundation of China
文摘To overcome the disadvantage that the standard least squares support vector regression(LS-SVR) algorithm is not suitable to multiple-input multiple-output(MIMO) system modelling directly,an improved LS-SVR algorithm which was defined as multi-output least squares support vector regression(MLSSVR) was put forward by adding samples' absolute errors in objective function and applied to flatness intelligent control.To solve the poor-precision problem of the control scheme based on effective matrix in flatness control,the predictive control was introduced into the control system and the effective matrix-predictive flatness control method was proposed by combining the merits of the two methods.Simulation experiment was conducted on 900HC reversible cold roll.The performance of effective matrix method and the effective matrix-predictive control method were compared,and the results demonstrate the validity of the effective matrix-predictive control method.
文摘In reliability analysis,the stress-strength model is often used to describe the life of a component which has a random strength(X)and is subjected to a random stress(Y).In this paper,we considered the problem of estimating the reliability𝑅𝑅=P[Y<X]when the distributions of both stress and strength are independent and follow exponentiated Pareto distribution.The maximum likelihood estimator of the stress strength reliability is calculated under simple random sample,ranked set sampling and median ranked set sampling methods.Four different reliability estimators under median ranked set sampling are derived.Two estimators are obtained when both strength and stress have an odd or an even set size.The two other estimators are obtained when the strength has an odd size and the stress has an even set size and vice versa.The performances of the suggested estimators are compared with their competitors under simple random sample via a simulation study.The simulation study revealed that the stress strength reliability estimates based on ranked set sampling and median ranked set sampling are more efficient than their competitors via simple random sample.In general,the stress strength reliability estimates based on median ranked set sampling are smaller than the corresponding estimates under ranked set sampling and simple random sample methods.Keywords:Stress-Strength model,ranked set sampling,median ranked set sampling,maximum likelihood estimation,mean square error.corresponding estimates under ranked set sampling and simple random sample methods.
文摘In this paper, the performance of existing biased estimators (Ridge Estimator (RE), Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator) and the respective predictors were considered in a misspecified linear regression model when there exists multicollinearity among explanatory variables. A generalized form was used to compare these estimators and predictors in the mean square error sense. Further, theoretical findings were established using mean square error matrix and scalar mean square error. Finally, a numerical example and a Monte Carlo simulation study were done to illustrate the theoretical findings. The simulation study revealed that LE and RE outperform the other estimators when weak multicollinearity exists, and RE, r-k class and r-d class estimators outperform the other estimators when moderated and high multicollinearity exist for certain values of shrinkage parameters, respectively. The predictors based on the LE and RE are always superior to the other predictors for certain values of shrinkage parameters.
基金Supported by National Natural Science Foundation of China (No.50478086)Tianjin Special Scientific Innovation Foundation (No.06FZZDSH00900)
文摘The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金supported by the Natural Sciences and Engineering Research Council of Canadathe National Natural Science Foundation of China+2 种基金the Doctorial Fund of Education Ministry of Chinasupported by the Natural Sciences and Engineering Research Council of Canadasupported by the National Natural Science Foundation of China
文摘Recursive algorithms are very useful for computing M-estimators of regression coefficients and scatter parameters. In this article, it is shown that for a nondecreasing ul (t), under some mild conditions the recursive M-estimators of regression coefficients and scatter parameters are strongly consistent and the recursive M-estimator of the regression coefficients is also asymptotically normal distributed. Furthermore, optimal recursive M-estimators, asymptotic efficiencies of recursive M-estimators and asymptotic relative efficiencies between recursive M-estimators of regression coefficients are studied.