Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of hea...Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by tradit...The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.展开更多
In this paper, we propose the double-penalized quantile regression estimators in partially linear models. An iterative algorithm is proposed for solving the proposed optimization problem. Some numerical examples illus...In this paper, we propose the double-penalized quantile regression estimators in partially linear models. An iterative algorithm is proposed for solving the proposed optimization problem. Some numerical examples illustrate that the finite sample performances of proposed method perform better than the least squares based method with regard to the non-causal selection rate (NSR) and the median of model error (MME) when the error distribution is heavy-tail. Finally, we apply the proposed methodology to analyze the ragweed pollen level dataset.展开更多
Consider the regression model, n. Here the design points (xi,ti) are known and nonrandom, and ei are random errors. The family of nonparametric estimates of g() including known estimates proposed by Gasser & Mulle...Consider the regression model, n. Here the design points (xi,ti) are known and nonrandom, and ei are random errors. The family of nonparametric estimates of g() including known estimates proposed by Gasser & Muller[1] is also proposed to be a class of new nearest neighbor estimates of g(). Baed on the nonparametric regression procedures, we investigate a statistic for testing H0:g=0, and obtain some aspoptotic results about estimates.展开更多
To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to...To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.展开更多
This article is concerned with the estimating problem of semiparametric varyingcoefficient partially linear regression models. By combining the local polynomial and least squares procedures Fan and Huang (2005) prop...This article is concerned with the estimating problem of semiparametric varyingcoefficient partially linear regression models. By combining the local polynomial and least squares procedures Fan and Huang (2005) proposed a profile least squares estimator for the parametric component and established its asymptotic normality. We further show that the profile least squares estimator can achieve the law of iterated logarithm. Moreover, we study the estimators of the functions characterizing the non-linear part as well as the error variance. The strong convergence rate and the law of iterated logarithm are derived for them, respectively.展开更多
Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global cir...Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global circulation model output (GCM). The objectives of this research were to determine the time lag of GCM data and build SD model using PCR method with time lag of the GCM precipitation data. The observations of rainfall data in Indramayu were taken from 1979 to 2007 showing similar patterns with GCM data on 1st grid to 64th grid after time shift (time lag). The time lag was determined using the cross-correlation function. However, GCM data of 64 grids showed multicollinearity problem. This problem was solved by principal component regression (PCR), but the PCR model resulted heterogeneous errors. PCR model was modified to overcome the errors with adding dummy variables to the model. Dummy variables were determined based on partial least squares regression (PLSR). The PCR model with dummy variables improved the rainfall prediction. The SD model with lag-GCM predictors was also better than SD model without lag-GCM.展开更多
In the article, hypothesis test for coefficients in high dimensional regression models is considered. I develop simultaneous test statistic for the hypothesis test in both linear and partial linear models. The derived...In the article, hypothesis test for coefficients in high dimensional regression models is considered. I develop simultaneous test statistic for the hypothesis test in both linear and partial linear models. The derived test is designed for growing p and fixed n where the conventional F-test is no longer appropriate. The asymptotic distribution of the proposed test statistic under the null hypothesis is obtained.展开更多
We consider a functional partially linear additive model that predicts a functional response by a scalar predictor and functional predictors. The B-spline and eigenbasis least squares estimator for both the parametric...We consider a functional partially linear additive model that predicts a functional response by a scalar predictor and functional predictors. The B-spline and eigenbasis least squares estimator for both the parametric and the nonparametric components proposed. In the final of this paper, as a result, we got the variance decomposition of the model and establish the asymptotic convergence rate for estimator.展开更多
Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed dat...Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.展开更多
We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying ...We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.展开更多
In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to...In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.展开更多
The deformation prediction models of Wuqiangxi concrete gravity dam are developed,including two statistical models and a deep learning model.In the statistical models,the reliable monitoring data are firstly determine...The deformation prediction models of Wuqiangxi concrete gravity dam are developed,including two statistical models and a deep learning model.In the statistical models,the reliable monitoring data are firstly determined with Lahitte criterion;then,the stepwise regression and partial least squares regression models for deformation prediction of concrete gravity dam are constructed in terms of the reliable monitoring data,and the factors of water pressure,temperature and time effect are considered in the models;finally,according to the monitoring data from 2006 to 2020 of five typical measuring points including J23(on dam section 24^(#)),J33(on dam section 4^(#)),J35(on dam section 8^(#)),J37(on dam section 12^(#)),and J39(on dam section 15^(#))located on the crest of Wuqiangxi concrete gravity dam,the settlement curves of the measuring points are obtained with the stepwise regression and partial least squares regression models.A deep learning model is developed based on long short-term memory(LSTM)recurrent neural network.In the LSTM model,two LSTMlayers are used,the rectified linear unit function is adopted as the activation function,the input sequence length is 20,and the random search is adopted.The monitoring data for the five typical measuring points from 2006 to 2017 are selected as the training set,and the monitoring data from 2018 to 2020 are taken as the test set.From the results of case study,we can find that(1)the good fitting results can be obtained with the two statistical models;(2)the partial least squares regression algorithm can solve the model with high correlation factors and reasonably explain the factors;(3)the prediction accuracy of the LSTM model increases with increasing the amount of training data.In the deformation prediction of concrete gravity dam,the LSTM model is suggested when there are sufficient training data,while the partial least squares regression method is suggested when the training data are insufficient.展开更多
This paper proposes a test procedure for testing the regression coefficients in high dimensional partially linear models based on the F-statistic. In the partially linear model, the authors first estimate the unknown ...This paper proposes a test procedure for testing the regression coefficients in high dimensional partially linear models based on the F-statistic. In the partially linear model, the authors first estimate the unknown nonlinear component by some nonparametric methods and then generalize the F-statistic to test the regression coefficients under some regular conditions. During this procedure, the estimation of the nonlinear component brings much challenge to explore the properties of generalized F-test. The authors obtain some asymptotic properties of the generalized F-test in more general cases,including the asymptotic normality and the power of this test with p/n ∈(0, 1) without normality assumption. The asymptotic result is general and by adding some constraint conditions we can obtain the similar conclusions in high dimensional linear models. Through simulation studies, the authors demonstrate good finite-sample performance of the proposed test in comparison with the theoretical results. The practical utility of our method is illustrated by a real data example.展开更多
We consider the semiparametric partially linear regression models with mean function XTβ + g(z), where X and z are functional data. The new estimators of β and g(z) are presented and some asymptotic results are...We consider the semiparametric partially linear regression models with mean function XTβ + g(z), where X and z are functional data. The new estimators of β and g(z) are presented and some asymptotic results are given. The strong convergence rates of the proposed estimators are obtained. In our estimation, the observation number of each subject will be completely flexible. Some simulation study is conducted to investigate the finite sample performance of the proposed estimators.展开更多
Consider a partially linear regression model with an unknown vector parameter , an unknown function g(·), and unknown heteroscedastic error variances. Chen, You<SUP>[23]</SUP> proposed a semiparametri...Consider a partially linear regression model with an unknown vector parameter , an unknown function g(·), and unknown heteroscedastic error variances. Chen, You<SUP>[23]</SUP> proposed a semiparametric generalized least squares estimator (SGLSE) for , which takes the heteroscedasticity into account to increase efficiency. For inference based on this SGLSE, it is necessary to construct a consistent estimator for its asymptotic covariance matrix. However, when there exists within-group correlation, the traditional delta method and the delete-1 jackknife estimation fail to offer such a consistent estimator. In this paper, by deleting grouped partial residuals a delete-group jackknife method is examined. It is shown that the delete-group jackknife method indeed can provide a consistent estimator for the asymptotic covariance matrix in the presence of within-group correlations. This result is an extension of that in [21].展开更多
In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the ...In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the responses and part of the covariates are missing at random,and the ultra-high dimension implies that the dimension of parameter is much larger than sample size.Based on the B-spline method for the varying coefficient functions,we study the consistency of the oracle estimator which is obtained only using active covariates whose coefficients are nonzero.At the same time,we discuss the asymptotic normality of the oracle estimator for the linear parameter.Note that the active covariates are unknown in practice,non-convex penalized estimator is investigated for simultaneous variable selection and estimation,whose oracle property is also established.Finite sample behavior of the proposed methods is investigated via simulations and real data analysis.展开更多
This paper considers tests for regression coefficients in high dimensional partially linear Models.The authors first use the B-spline method to estimate the unknown smooth function so that it could be linearly express...This paper considers tests for regression coefficients in high dimensional partially linear Models.The authors first use the B-spline method to estimate the unknown smooth function so that it could be linearly expressed.Then,the authors propose an empirical likelihood method to test regression coefficients.The authors derive the asymptotic chi-squared distribution with two degrees of freedom of the proposed test statistics under the null hypothesis.In addition,the method is extended to test with nuisance parameters.Simulations show that the proposed method have a good performance in control of type-I error rate and power.The proposed method is also employed to analyze a data of Skin Cutaneous Melanoma(SKCM).展开更多
Testing heteroscedasticity determines whether the regression model can predict the dependent variable consistently across all values of the explanatory variables.Since the proposed tests could not detect heteroscedast...Testing heteroscedasticity determines whether the regression model can predict the dependent variable consistently across all values of the explanatory variables.Since the proposed tests could not detect heteroscedasticity in all cases,more precisely in heavy-tailed distributions,the authors established new comprehensive test statistic based on Levene’s test.The authors built the asymptotic normality of the test statistic under the null hypothesis of homoscedasticity based on the recent theory of analysis of variance for the infinite factors level.The proposed test uses the residuals from a regression model fit of the mean function with Levene’s test to assess homogeneity of variance.Simulation studies show that our test yields better than other methods in almost all cases even if the variance is a nonlinear function.Finally,the proposed method is implemented through a real data-set.展开更多
基金the Hi-Tech Research and Development Program (863) of China (No. 2006AA10Z203)the National Scienceand Technology Task Force Project (No. 2006BAD10A01), China
文摘Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
文摘The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.
文摘In this paper, we propose the double-penalized quantile regression estimators in partially linear models. An iterative algorithm is proposed for solving the proposed optimization problem. Some numerical examples illustrate that the finite sample performances of proposed method perform better than the least squares based method with regard to the non-causal selection rate (NSR) and the median of model error (MME) when the error distribution is heavy-tail. Finally, we apply the proposed methodology to analyze the ragweed pollen level dataset.
文摘Consider the regression model, n. Here the design points (xi,ti) are known and nonrandom, and ei are random errors. The family of nonparametric estimates of g() including known estimates proposed by Gasser & Muller[1] is also proposed to be a class of new nearest neighbor estimates of g(). Baed on the nonparametric regression procedures, we investigate a statistic for testing H0:g=0, and obtain some aspoptotic results about estimates.
基金Funded by the Natural Basic Research Program of China under the grant No. 2005CB422207.
文摘To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.
基金supported by the National Natural Science Funds for Distinguished Young Scholar (70825004)National Natural Science Foundation of China (NSFC) (10731010 and 10628104)+3 种基金the National Basic Research Program (2007CB814902)Creative Research Groups of China (10721101)Leading Academic Discipline Program, the 10th five year plan of 211 Project for Shanghai University of Finance and Economics211 Project for Shanghai University of Financeand Economics (the 3rd phase)
文摘This article is concerned with the estimating problem of semiparametric varyingcoefficient partially linear regression models. By combining the local polynomial and least squares procedures Fan and Huang (2005) proposed a profile least squares estimator for the parametric component and established its asymptotic normality. We further show that the profile least squares estimator can achieve the law of iterated logarithm. Moreover, we study the estimators of the functions characterizing the non-linear part as well as the error variance. The strong convergence rate and the law of iterated logarithm are derived for them, respectively.
文摘Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global circulation model output (GCM). The objectives of this research were to determine the time lag of GCM data and build SD model using PCR method with time lag of the GCM precipitation data. The observations of rainfall data in Indramayu were taken from 1979 to 2007 showing similar patterns with GCM data on 1st grid to 64th grid after time shift (time lag). The time lag was determined using the cross-correlation function. However, GCM data of 64 grids showed multicollinearity problem. This problem was solved by principal component regression (PCR), but the PCR model resulted heterogeneous errors. PCR model was modified to overcome the errors with adding dummy variables to the model. Dummy variables were determined based on partial least squares regression (PLSR). The PCR model with dummy variables improved the rainfall prediction. The SD model with lag-GCM predictors was also better than SD model without lag-GCM.
文摘In the article, hypothesis test for coefficients in high dimensional regression models is considered. I develop simultaneous test statistic for the hypothesis test in both linear and partial linear models. The derived test is designed for growing p and fixed n where the conventional F-test is no longer appropriate. The asymptotic distribution of the proposed test statistic under the null hypothesis is obtained.
文摘We consider a functional partially linear additive model that predicts a functional response by a scalar predictor and functional predictors. The B-spline and eigenbasis least squares estimator for both the parametric and the nonparametric components proposed. In the final of this paper, as a result, we got the variance decomposition of the model and establish the asymptotic convergence rate for estimator.
文摘Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.
文摘We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.
基金Supported by the National Social Science Foundation of China(Grant No.22BTJ059)。
文摘In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.
文摘The deformation prediction models of Wuqiangxi concrete gravity dam are developed,including two statistical models and a deep learning model.In the statistical models,the reliable monitoring data are firstly determined with Lahitte criterion;then,the stepwise regression and partial least squares regression models for deformation prediction of concrete gravity dam are constructed in terms of the reliable monitoring data,and the factors of water pressure,temperature and time effect are considered in the models;finally,according to the monitoring data from 2006 to 2020 of five typical measuring points including J23(on dam section 24^(#)),J33(on dam section 4^(#)),J35(on dam section 8^(#)),J37(on dam section 12^(#)),and J39(on dam section 15^(#))located on the crest of Wuqiangxi concrete gravity dam,the settlement curves of the measuring points are obtained with the stepwise regression and partial least squares regression models.A deep learning model is developed based on long short-term memory(LSTM)recurrent neural network.In the LSTM model,two LSTMlayers are used,the rectified linear unit function is adopted as the activation function,the input sequence length is 20,and the random search is adopted.The monitoring data for the five typical measuring points from 2006 to 2017 are selected as the training set,and the monitoring data from 2018 to 2020 are taken as the test set.From the results of case study,we can find that(1)the good fitting results can be obtained with the two statistical models;(2)the partial least squares regression algorithm can solve the model with high correlation factors and reasonably explain the factors;(3)the prediction accuracy of the LSTM model increases with increasing the amount of training data.In the deformation prediction of concrete gravity dam,the LSTM model is suggested when there are sufficient training data,while the partial least squares regression method is suggested when the training data are insufficient.
基金supported by the Natural Science Foundation of China under Grant Nos.11231010,11471223,11501586BCMIIS and Key Project of Beijing Municipal Educational Commission under Grant No.KZ201410028030
文摘This paper proposes a test procedure for testing the regression coefficients in high dimensional partially linear models based on the F-statistic. In the partially linear model, the authors first estimate the unknown nonlinear component by some nonparametric methods and then generalize the F-statistic to test the regression coefficients under some regular conditions. During this procedure, the estimation of the nonlinear component brings much challenge to explore the properties of generalized F-test. The authors obtain some asymptotic properties of the generalized F-test in more general cases,including the asymptotic normality and the power of this test with p/n ∈(0, 1) without normality assumption. The asymptotic result is general and by adding some constraint conditions we can obtain the similar conclusions in high dimensional linear models. Through simulation studies, the authors demonstrate good finite-sample performance of the proposed test in comparison with the theoretical results. The practical utility of our method is illustrated by a real data example.
文摘We consider the semiparametric partially linear regression models with mean function XTβ + g(z), where X and z are functional data. The new estimators of β and g(z) are presented and some asymptotic results are given. The strong convergence rates of the proposed estimators are obtained. In our estimation, the observation number of each subject will be completely flexible. Some simulation study is conducted to investigate the finite sample performance of the proposed estimators.
文摘Consider a partially linear regression model with an unknown vector parameter , an unknown function g(·), and unknown heteroscedastic error variances. Chen, You<SUP>[23]</SUP> proposed a semiparametric generalized least squares estimator (SGLSE) for , which takes the heteroscedasticity into account to increase efficiency. For inference based on this SGLSE, it is necessary to construct a consistent estimator for its asymptotic covariance matrix. However, when there exists within-group correlation, the traditional delta method and the delete-1 jackknife estimation fail to offer such a consistent estimator. In this paper, by deleting grouped partial residuals a delete-group jackknife method is examined. It is shown that the delete-group jackknife method indeed can provide a consistent estimator for the asymptotic covariance matrix in the presence of within-group correlations. This result is an extension of that in [21].
基金Supported by National Natural Science Foundation of China(Grant No.12071348)Fundamental Research Funds for Central Universities,China(Grant No.2023-3-2D-04)。
文摘In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the responses and part of the covariates are missing at random,and the ultra-high dimension implies that the dimension of parameter is much larger than sample size.Based on the B-spline method for the varying coefficient functions,we study the consistency of the oracle estimator which is obtained only using active covariates whose coefficients are nonzero.At the same time,we discuss the asymptotic normality of the oracle estimator for the linear parameter.Note that the active covariates are unknown in practice,non-convex penalized estimator is investigated for simultaneous variable selection and estimation,whose oracle property is also established.Finite sample behavior of the proposed methods is investigated via simulations and real data analysis.
基金supported by the University of Chinese Academy of Sciences under Grant No.Y95401TXX2Beijing Natural Science Foundation under Grant No.Z190004Key Program of Joint Funds of the National Natural Science Foundation of China under Grant No.U19B2040。
文摘This paper considers tests for regression coefficients in high dimensional partially linear Models.The authors first use the B-spline method to estimate the unknown smooth function so that it could be linearly expressed.Then,the authors propose an empirical likelihood method to test regression coefficients.The authors derive the asymptotic chi-squared distribution with two degrees of freedom of the proposed test statistics under the null hypothesis.In addition,the method is extended to test with nuisance parameters.Simulations show that the proposed method have a good performance in control of type-I error rate and power.The proposed method is also employed to analyze a data of Skin Cutaneous Melanoma(SKCM).
基金partly supported by the National Natural Science Foundation of China under Grant Nos.11571073,11701286,NSF,JS(BK20171073)
文摘Testing heteroscedasticity determines whether the regression model can predict the dependent variable consistently across all values of the explanatory variables.Since the proposed tests could not detect heteroscedasticity in all cases,more precisely in heavy-tailed distributions,the authors established new comprehensive test statistic based on Levene’s test.The authors built the asymptotic normality of the test statistic under the null hypothesis of homoscedasticity based on the recent theory of analysis of variance for the infinite factors level.The proposed test uses the residuals from a regression model fit of the mean function with Levene’s test to assess homogeneity of variance.Simulation studies show that our test yields better than other methods in almost all cases even if the variance is a nonlinear function.Finally,the proposed method is implemented through a real data-set.