In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear...In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear models (GLMs) with massive data. We first present a general subsampling decorrelated scorefunction to reduce the influence of the less accurate nuisance parameter estimation with the slow convergencerate. The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelatedscore subsampling algorithm are established, and two optimal subsampling probabilities are derived under theA- and L-optimality criteria to downsize the data volume and reduce the computational burden. The proposedoptimal subsampling probabilities provably improve the asymptotic efficiency of the subsampling schemes in thelow-dimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs.A two-step algorithm is further proposed to implement, and the asymptotic properties of the correspondingestimators are also given. Simulations show satisfactory performance of the proposed estimators, and twoapplications to census income and Fashion-MNIST datasets also demonstrate its practical applicability.展开更多
We thank Tang and Ju for their review on statistical inference for univariate response data with nonignorable missing.In this paper,we mainly discuss some issues on longitudinal data with nonignorable dropout.In resea...We thank Tang and Ju for their review on statistical inference for univariate response data with nonignorable missing.In this paper,we mainly discuss some issues on longitudinal data with nonignorable dropout.In research areas such as medicine,population health,economics,social sciences and sample surveys,data are often collected from every sampled subject at T time points,which are referred to as longitudinal data.Let Y=(y1,...,yT)be a T dimensional vector of the study variable with distribution denoted by p(Y),and X be a q-dimensional time-independent continuous covariate associated with Y.展开更多
Quantile treatment effects can be important causal estimands in evaluation of biomedical treatments or interventions for health outcomes such as medical cost and utilisation.We consider their estimation in observation...Quantile treatment effects can be important causal estimands in evaluation of biomedical treatments or interventions for health outcomes such as medical cost and utilisation.We consider their estimation in observational studies with many possible covariates under the assumption that treatment and potential outcomes are independent conditional on all covariates.To obtain valid and efficient treatment effect estimators,we replace the set of all covariates by lower dimensional sets for estimation of the quantiles of potential outcomes.These lower dimensional sets are obtained using sufficient dimension reduction tools and are outcome specific.We justify our choice from efficiency point of view.We prove the asymptotic normality of our estimators and our theory is complemented by some simulation results and an application to data from the University of Wisconsin Health Accountable Care Organization.展开更多
A popular imputation method used to compensate for item nonresponse in sample surveys is thenearest neighbour imputation (NNI) method utilising a covariate to defined neighbours. Whenthe covariate is multivariate, how...A popular imputation method used to compensate for item nonresponse in sample surveys is thenearest neighbour imputation (NNI) method utilising a covariate to defined neighbours. Whenthe covariate is multivariate, however, NNI suffers the well-known curse of dimensionality andgives unstable results. As a remedy, we propose a single-index NNI when the conditional meanof response given covariates follows a single index model. For estimating the population meanor quantiles, we establish the consistency and asymptotic normality of the single-index NNI estimators. Some limited simulation results are presented to examine the finite-sample performanceof the proposed estimator of population mean.展开更多
基金This work was supported by the Fundamental Research Funds for the Central Universities,National Natural Science Foundation of China(Grant No.12271272)and the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin.
文摘In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear models (GLMs) with massive data. We first present a general subsampling decorrelated scorefunction to reduce the influence of the less accurate nuisance parameter estimation with the slow convergencerate. The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelatedscore subsampling algorithm are established, and two optimal subsampling probabilities are derived under theA- and L-optimality criteria to downsize the data volume and reduce the computational burden. The proposedoptimal subsampling probabilities provably improve the asymptotic efficiency of the subsampling schemes in thelow-dimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs.A two-step algorithm is further proposed to implement, and the asymptotic properties of the correspondingestimators are also given. Simulations show satisfactory performance of the proposed estimators, and twoapplications to census income and Fashion-MNIST datasets also demonstrate its practical applicability.
基金This research was also supported by the National Natural Science Foundation of China(11501208,11871287)Fundamental Research Funds for the Central Universities.
文摘We thank Tang and Ju for their review on statistical inference for univariate response data with nonignorable missing.In this paper,we mainly discuss some issues on longitudinal data with nonignorable dropout.In research areas such as medicine,population health,economics,social sciences and sample surveys,data are often collected from every sampled subject at T time points,which are referred to as longitudinal data.Let Y=(y1,...,yT)be a T dimensional vector of the study variable with distribution denoted by p(Y),and X be a q-dimensional time-independent continuous covariate associated with Y.
基金supported by the National Natural Science Foundation of China(11871287,11831008)the Natural Science Foundation of Tianjin(18JCYBJC41100)+3 种基金the Fundamental Research Funds for the Central Universitiesthe Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin,the Chinese 111 Project(B14019)the U.S.National Science Foundation(DMS-1612873 and DMS-1914411partially supported through a Patient-Centered Outcomes Research Institute(PCORI)Award(ME-1409-21219).
文摘Quantile treatment effects can be important causal estimands in evaluation of biomedical treatments or interventions for health outcomes such as medical cost and utilisation.We consider their estimation in observational studies with many possible covariates under the assumption that treatment and potential outcomes are independent conditional on all covariates.To obtain valid and efficient treatment effect estimators,we replace the set of all covariates by lower dimensional sets for estimation of the quantiles of potential outcomes.These lower dimensional sets are obtained using sufficient dimension reduction tools and are outcome specific.We justify our choice from efficiency point of view.We prove the asymptotic normality of our estimators and our theory is complemented by some simulation results and an application to data from the University of Wisconsin Health Accountable Care Organization.
基金This work was partially supported by the National Natural Science Foundation of China grants 11831008 and 11871287the U.S.National Science Foundation grants DMS-1612873 and DMS-1914411the Natural Science Foundation of Tianjin(18JCYBJC41100)and the Fundamental Research Funds for the Central Universities.
文摘A popular imputation method used to compensate for item nonresponse in sample surveys is thenearest neighbour imputation (NNI) method utilising a covariate to defined neighbours. Whenthe covariate is multivariate, however, NNI suffers the well-known curse of dimensionality andgives unstable results. As a remedy, we propose a single-index NNI when the conditional meanof response given covariates follows a single index model. For estimating the population meanor quantiles, we establish the consistency and asymptotic normality of the single-index NNI estimators. Some limited simulation results are presented to examine the finite-sample performanceof the proposed estimator of population mean.