New technological advancements combined with powerful computer hardware and high-speed network make big data available.The massive sample size of big data introduces unique computational challenges on scalability and ...New technological advancements combined with powerful computer hardware and high-speed network make big data available.The massive sample size of big data introduces unique computational challenges on scalability and storage of statistical methods.In this paper,we focus on the lack of fit test of parametric regression models under the framework of big data.We develop a computationally feasible testing approach via integrating the divide-and-conquer algorithm into a powerful nonparametric test statistic.Our theory results show that under mild conditions,the asymptotic null distribution of the proposed test is standard normal.Furthermore,the proposed test benefits fromthe use of data-driven bandwidth procedure and thus possesses certain adaptive property.Simulation studies show that the proposed method has satisfactory performances,and it is illustrated with an analysis of an airline data.展开更多
We consider the estimation of causal treatment effect using nonparametric regression orinverse propensity weighting together with sufficient dimension reduction for searching lowdimensional covariate subsets. A specia...We consider the estimation of causal treatment effect using nonparametric regression orinverse propensity weighting together with sufficient dimension reduction for searching lowdimensional covariate subsets. A special case of this problem is the estimation of a responseeffect with data having ignorable missing response values. An issue that is not well addressedin the literature is whether the estimation of the low-dimensional covariate subsets by sufficient dimension reduction has an impact on the asymptotic variance of the resulting causaleffect estimator. With some incorrect or inaccurate statements, many researchers believe thatthe estimation of the low-dimensional covariate subsets by sufficient dimension reduction doesnot affect the asymptotic variance. We rigorously establish a result showing that this is nottrue unless the low-dimensional covariate subsets include some covariates superfluous for estimation, and including such covariates loses efficiency. Our theory is supplemented by somesimulation results.展开更多
基金This paper was supported by the National Natural Science Foundation of China[grant number 11431006][grant num-ber 11690015]+1 种基金[grant number 11371202][grant number 11622104].
文摘New technological advancements combined with powerful computer hardware and high-speed network make big data available.The massive sample size of big data introduces unique computational challenges on scalability and storage of statistical methods.In this paper,we focus on the lack of fit test of parametric regression models under the framework of big data.We develop a computationally feasible testing approach via integrating the divide-and-conquer algorithm into a powerful nonparametric test statistic.Our theory results show that under mild conditions,the asymptotic null distribution of the proposed test is standard normal.Furthermore,the proposed test benefits fromthe use of data-driven bandwidth procedure and thus possesses certain adaptive property.Simulation studies show that the proposed method has satisfactory performances,and it is illustrated with an analysis of an airline data.
基金This research was partially supported through a PatientCentered Outcomes Research Institute(PCORI)Award(ME-1409-21219)This research was also supported by the National Natural Science Foundation of China(11501208)+2 种基金Fundamental Research Funds for the Central Universities,National Social Science Foundation(13BTJ009)the Chinese 111 Project grant(B14019)the U.S.National Science Foundation(DMS-1305474 and DMS-1612873).
文摘We consider the estimation of causal treatment effect using nonparametric regression orinverse propensity weighting together with sufficient dimension reduction for searching lowdimensional covariate subsets. A special case of this problem is the estimation of a responseeffect with data having ignorable missing response values. An issue that is not well addressedin the literature is whether the estimation of the low-dimensional covariate subsets by sufficient dimension reduction has an impact on the asymptotic variance of the resulting causaleffect estimator. With some incorrect or inaccurate statements, many researchers believe thatthe estimation of the low-dimensional covariate subsets by sufficient dimension reduction doesnot affect the asymptotic variance. We rigorously establish a result showing that this is nottrue unless the low-dimensional covariate subsets include some covariates superfluous for estimation, and including such covariates loses efficiency. Our theory is supplemented by somesimulation results.