This paper aims to develop a new robust U-type test for high dimensional regression coefficients using the estimated U-statistic of order two and refitted cross-validation error variance estimation. It is proved that ...This paper aims to develop a new robust U-type test for high dimensional regression coefficients using the estimated U-statistic of order two and refitted cross-validation error variance estimation. It is proved that the limiting null distribution of the proposed new test is normal under two kinds of ordinary models.We further study the local power of the proposed test and compare with other competitive tests for high dimensional data. The idea of refitted cross-validation approach is utilized to reduce the bias of sample variance in the estimation of the test statistic. Our theoretical results indicate that the proposed test can have even more substantial power gain than the test by Zhong and Chen(2011) when testing a hypothesis with outlying observations and heavy tailed distributions. We assess the finite-sample performance of the proposed test by examining its size and power via Monte Carlo studies. We also illustrate the application of the proposed test by an empirical analysis of a real data example.展开更多
This paper studies the re-adjusted cross-validation method and a semiparametric regression model called the varying index coefficient model. We use the profile spline modal estimator method to estimate the coefficient...This paper studies the re-adjusted cross-validation method and a semiparametric regression model called the varying index coefficient model. We use the profile spline modal estimator method to estimate the coefficients of the parameter part of the Varying Index Coefficient Model (VICM), while the unknown function part uses the B-spline to expand. Moreover, we combine the above two estimation methods under the assumption of high-dimensional data. The results of data simulation and empirical analysis show that for the varying index coefficient model, the re-adjusted cross-validation method is better in terms of accuracy and stability than traditional methods based on ordinary least squares.展开更多
In this paper,we mainly study how to estimate the error density in the ultrahigh dimensional sparse additive model,where the number of variables is larger than the sample size.First,a smoothing method based on B-splin...In this paper,we mainly study how to estimate the error density in the ultrahigh dimensional sparse additive model,where the number of variables is larger than the sample size.First,a smoothing method based on B-splines is applied to the estimation of regression functions.Second,an improved two-stage refitted crossvalidation(RCV)procedure by random splitting technique is used to obtain the residuals of the model,and then the residual-based kernel method is applied to estimate the error density function.Under suitable sparse conditions,the large sample properties of the estimator,including the weak and strong consistency,as well as normality and the law of the iterated logarithm,are obtained.Especially,the relationship between the sparsity and the convergence rate of the kernel density estimator is given.The methodology is illustrated by simulations and a real data example,which suggests that the proposed method performs well.展开更多
This paper focuses on error density estimation in ultrahigh dimensional sparse linear model,where the error term may have a heavy-tailed distribution.First,an improved two-stage refitted crossvalidation method combine...This paper focuses on error density estimation in ultrahigh dimensional sparse linear model,where the error term may have a heavy-tailed distribution.First,an improved two-stage refitted crossvalidation method combined with some robust variable screening procedures such as RRCS and variable selection methods such as LAD-SCAD is used to obtain the submodel,and then the residual-based kernel density method is applied to estimate the error density through LAD regression.Under given conditions,the large sample properties of the estimator are also established.Especially,we explicitly give the relationship between the sparsity and the convergence rate of the kernel density estimator.The simulation results show that the proposed error density estimator has a good performance.A real data example is presented to illustrate our methods.展开更多
Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, an...Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.展开更多
Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, an...Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.展开更多
基金supported by National Natural Science Foundation of China (Grant Nos. 11071022, 11231010 and 11471223)Beijing Center for Mathematics and Information Interdisciplinary ScienceKey Project of Beijing Municipal Educational Commission (Grant No. KZ201410028030)
文摘This paper aims to develop a new robust U-type test for high dimensional regression coefficients using the estimated U-statistic of order two and refitted cross-validation error variance estimation. It is proved that the limiting null distribution of the proposed new test is normal under two kinds of ordinary models.We further study the local power of the proposed test and compare with other competitive tests for high dimensional data. The idea of refitted cross-validation approach is utilized to reduce the bias of sample variance in the estimation of the test statistic. Our theoretical results indicate that the proposed test can have even more substantial power gain than the test by Zhong and Chen(2011) when testing a hypothesis with outlying observations and heavy tailed distributions. We assess the finite-sample performance of the proposed test by examining its size and power via Monte Carlo studies. We also illustrate the application of the proposed test by an empirical analysis of a real data example.
文摘This paper studies the re-adjusted cross-validation method and a semiparametric regression model called the varying index coefficient model. We use the profile spline modal estimator method to estimate the coefficients of the parameter part of the Varying Index Coefficient Model (VICM), while the unknown function part uses the B-spline to expand. Moreover, we combine the above two estimation methods under the assumption of high-dimensional data. The results of data simulation and empirical analysis show that for the varying index coefficient model, the re-adjusted cross-validation method is better in terms of accuracy and stability than traditional methods based on ordinary least squares.
基金supported by National Natural Science Foundation of China (Grant Nos. 11971324 and 11471223)Interdisciplinary Construction of Bioinformatics and StatisticsAcademy for Multidisciplinary Studies, Capital Normal University
文摘In this paper,we mainly study how to estimate the error density in the ultrahigh dimensional sparse additive model,where the number of variables is larger than the sample size.First,a smoothing method based on B-splines is applied to the estimation of regression functions.Second,an improved two-stage refitted crossvalidation(RCV)procedure by random splitting technique is used to obtain the residuals of the model,and then the residual-based kernel method is applied to estimate the error density function.Under suitable sparse conditions,the large sample properties of the estimator,including the weak and strong consistency,as well as normality and the law of the iterated logarithm,are obtained.Especially,the relationship between the sparsity and the convergence rate of the kernel density estimator is given.The methodology is illustrated by simulations and a real data example,which suggests that the proposed method performs well.
基金Supported by the National Natural Science Foundation of China(Grant No.11971324)the State Key Program of National Natural Science Foundation of China(Grant No.12031016)。
文摘This paper focuses on error density estimation in ultrahigh dimensional sparse linear model,where the error term may have a heavy-tailed distribution.First,an improved two-stage refitted crossvalidation method combined with some robust variable screening procedures such as RRCS and variable selection methods such as LAD-SCAD is used to obtain the submodel,and then the residual-based kernel density method is applied to estimate the error density through LAD regression.Under given conditions,the large sample properties of the estimator are also established.Especially,we explicitly give the relationship between the sparsity and the convergence rate of the kernel density estimator.The simulation results show that the proposed error density estimator has a good performance.A real data example is presented to illustrate our methods.
文摘Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.
基金supported by National Natural Science Foundation of China(Grant No.11301031)
文摘Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.