In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear...In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear models (GLMs) with massive data. We first present a general subsampling decorrelated scorefunction to reduce the influence of the less accurate nuisance parameter estimation with the slow convergencerate. The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelatedscore subsampling algorithm are established, and two optimal subsampling probabilities are derived under theA- and L-optimality criteria to downsize the data volume and reduce the computational burden. The proposedoptimal subsampling probabilities provably improve the asymptotic efficiency of the subsampling schemes in thelow-dimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs.A two-step algorithm is further proposed to implement, and the asymptotic properties of the correspondingestimators are also given. Simulations show satisfactory performance of the proposed estimators, and twoapplications to census income and Fashion-MNIST datasets also demonstrate its practical applicability.展开更多
基金This work was supported by the Fundamental Research Funds for the Central Universities,National Natural Science Foundation of China(Grant No.12271272)and the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin.
文摘In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear models (GLMs) with massive data. We first present a general subsampling decorrelated scorefunction to reduce the influence of the less accurate nuisance parameter estimation with the slow convergencerate. The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelatedscore subsampling algorithm are established, and two optimal subsampling probabilities are derived under theA- and L-optimality criteria to downsize the data volume and reduce the computational burden. The proposedoptimal subsampling probabilities provably improve the asymptotic efficiency of the subsampling schemes in thelow-dimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs.A two-step algorithm is further proposed to implement, and the asymptotic properties of the correspondingestimators are also given. Simulations show satisfactory performance of the proposed estimators, and twoapplications to census income and Fashion-MNIST datasets also demonstrate its practical applicability.