In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to...In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.展开更多
Boreal forests play an important role in global environment systems. Understanding boreal forest ecosystem structure and function requires accurate monitoring and estimating of forest canopy and biomass. We used parti...Boreal forests play an important role in global environment systems. Understanding boreal forest ecosystem structure and function requires accurate monitoring and estimating of forest canopy and biomass. We used partial least square regression (PLSR) models to relate forest parameters, i.e. canopy closure density and above ground tree biomass, to Landsat ETM+ data. The established models were optimized according to the variable importance for projection (VIP) criterion and the bootstrap method, and their performance was compared using several statistical indices. All variables selected by the VIP criterion passed the bootstrap test (p〈0.05). The simplified models without insignificant variables (VIP 〈1) performed as well as the full model but with less computation time. The relative root mean square error (RMSE%) was 29% for canopy closure density, and 58% for above ground tree biomass. We conclude that PLSR can be an effective method for estimating canopy closure density and above ground biomass.展开更多
With the development of mid-infrared (MIR) photoelectric devices, mid-infrared spectroscopy has become one of the important methods for non-invasive detection of blood glucose. The mid-infrared region (4000 - 400 cm&l...With the development of mid-infrared (MIR) photoelectric devices, mid-infrared spectroscopy has become one of the important methods for non-invasive detection of blood glucose. The mid-infrared region (4000 - 400 cm<sup>-1</sup>) has the well-known fingerprint region (1200 - 800 cm<sup>-1</sup>) of glucose, which has clearer characteristic absorption peaks and better specificity. There is a lot of molecular information about glucose in the MIR. The non-invasive detection of blood glucose by mid-infrared spectroscopy needs to achieve certain accuracy, and the quantitative model is an important factor affecting the accuracy of glucose detection. In this paper, the samples of imitation solution containing only glucose and the samples of imitation mixed solution are taken as the research objects, and the mid-infrared spectral data of the samples are collected. The full spectrum partial least squares Regression (PLSR) model, SNV + Ctr-PLSR model, MSC + Ctr-PLSR model, and convolutional neural networks (CNN) model of 3000 - 900 cm<sup>-1</sup> band were constructed. Full spectrum PLS model and CNN model of 1200 - 900 cm<sup>-1</sup> band were constructed. The experimental results show that the optimal model of the two bands is CNN, then the correlation coefficient of prediction set (Rp) of 3000 - 900 cm<sup>-1</sup> band is 0.95, and the root mean square error of pre-diction set (RMSEP) value is 22.10. The Rp of 1200 - 900 cm<sup>-1</sup> band is 0.95, and the RMSEP value is 22.54. The research results show that CNN is a promising method, which has higher accuracy than PLSR, and is especially suitable for modeling human complex environment. In addition, the study provides a theoretical and practical basis for CNN in feature selection and model interpretation.展开更多
Consider a repeated measurement partially linear regression model with anunknown vector parameter β_1, an unknown function g(·), and unknown heteroscedastic errorvariances. In order to improve the semiparametric...Consider a repeated measurement partially linear regression model with anunknown vector parameter β_1, an unknown function g(·), and unknown heteroscedastic errorvariances. In order to improve the semiparametric generalized least squares estimator (SGLSE) of ,we propose an iterative weighted semiparametric least squares estimator (IWSLSE) and show that itimproves upon the SGLSE in terms of asymptotic covariance matrix. An adaptive procedure is given todetermine the number of iterations. We also show that when the number of replicates is less than orequal to two, the IWSLSE can not improve upon the SGLSE. These results are generalizations of thosein [2] to the case of semiparametric regressions.展开更多
This study presented the application of partial least squares regression (PLSR) in estimating daily pan evaporation by utilizing the unique feature of PLSR in eliminating collinearity issues in predictor variables. ...This study presented the application of partial least squares regression (PLSR) in estimating daily pan evaporation by utilizing the unique feature of PLSR in eliminating collinearity issues in predictor variables. The climate variables and daily pan evaporation data measured at two weather stations located near Elephant Butte Reservoir, New Mexico, USA and a weather station located in Shanshan County, Xinjiang, China were used in the study. The nonlinear relationship between climate variables and daily pan evaporation was successfully modeled using PLSR approach by solving collinearity that exists in the climate variables. The modeling results were compared to artificial neural networks (ANN) models with the same input variables. The resuits showed that the nonlinear equations developed using PLSR has similar performance with complex ANN approach for the study sites. The modeling process was straightforward and the equations were simpler and more explicit than the ANN black-box models.展开更多
基金Supported by the National Social Science Foundation of China(Grant No.22BTJ059)。
文摘In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.
基金supported by the 948 Program of the State Forestry Administration (2009-4-43)the National Natura Science Foundation of China (No.30870420)
文摘Boreal forests play an important role in global environment systems. Understanding boreal forest ecosystem structure and function requires accurate monitoring and estimating of forest canopy and biomass. We used partial least square regression (PLSR) models to relate forest parameters, i.e. canopy closure density and above ground tree biomass, to Landsat ETM+ data. The established models were optimized according to the variable importance for projection (VIP) criterion and the bootstrap method, and their performance was compared using several statistical indices. All variables selected by the VIP criterion passed the bootstrap test (p〈0.05). The simplified models without insignificant variables (VIP 〈1) performed as well as the full model but with less computation time. The relative root mean square error (RMSE%) was 29% for canopy closure density, and 58% for above ground tree biomass. We conclude that PLSR can be an effective method for estimating canopy closure density and above ground biomass.
文摘With the development of mid-infrared (MIR) photoelectric devices, mid-infrared spectroscopy has become one of the important methods for non-invasive detection of blood glucose. The mid-infrared region (4000 - 400 cm<sup>-1</sup>) has the well-known fingerprint region (1200 - 800 cm<sup>-1</sup>) of glucose, which has clearer characteristic absorption peaks and better specificity. There is a lot of molecular information about glucose in the MIR. The non-invasive detection of blood glucose by mid-infrared spectroscopy needs to achieve certain accuracy, and the quantitative model is an important factor affecting the accuracy of glucose detection. In this paper, the samples of imitation solution containing only glucose and the samples of imitation mixed solution are taken as the research objects, and the mid-infrared spectral data of the samples are collected. The full spectrum partial least squares Regression (PLSR) model, SNV + Ctr-PLSR model, MSC + Ctr-PLSR model, and convolutional neural networks (CNN) model of 3000 - 900 cm<sup>-1</sup> band were constructed. Full spectrum PLS model and CNN model of 1200 - 900 cm<sup>-1</sup> band were constructed. The experimental results show that the optimal model of the two bands is CNN, then the correlation coefficient of prediction set (Rp) of 3000 - 900 cm<sup>-1</sup> band is 0.95, and the root mean square error of pre-diction set (RMSEP) value is 22.10. The Rp of 1200 - 900 cm<sup>-1</sup> band is 0.95, and the RMSEP value is 22.54. The research results show that CNN is a promising method, which has higher accuracy than PLSR, and is especially suitable for modeling human complex environment. In addition, the study provides a theoretical and practical basis for CNN in feature selection and model interpretation.
基金supported by a grant from the Natural Sciences and Engineering Research Council of Canada.
文摘Consider a repeated measurement partially linear regression model with anunknown vector parameter β_1, an unknown function g(·), and unknown heteroscedastic errorvariances. In order to improve the semiparametric generalized least squares estimator (SGLSE) of ,we propose an iterative weighted semiparametric least squares estimator (IWSLSE) and show that itimproves upon the SGLSE in terms of asymptotic covariance matrix. An adaptive procedure is given todetermine the number of iterations. We also show that when the number of replicates is less than orequal to two, the IWSLSE can not improve upon the SGLSE. These results are generalizations of thosein [2] to the case of semiparametric regressions.
基金supported in part by the National Natural Science Founda-tion of China (Grant Nos.51069017,41071026)their sincere appreciation of the reviewers’ valuable suggestions and comments in improving the quality of this paper
文摘This study presented the application of partial least squares regression (PLSR) in estimating daily pan evaporation by utilizing the unique feature of PLSR in eliminating collinearity issues in predictor variables. The climate variables and daily pan evaporation data measured at two weather stations located near Elephant Butte Reservoir, New Mexico, USA and a weather station located in Shanshan County, Xinjiang, China were used in the study. The nonlinear relationship between climate variables and daily pan evaporation was successfully modeled using PLSR approach by solving collinearity that exists in the climate variables. The modeling results were compared to artificial neural networks (ANN) models with the same input variables. The resuits showed that the nonlinear equations developed using PLSR has similar performance with complex ANN approach for the study sites. The modeling process was straightforward and the equations were simpler and more explicit than the ANN black-box models.