In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear...In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear models (GLMs) with massive data. We first present a general subsampling decorrelated scorefunction to reduce the influence of the less accurate nuisance parameter estimation with the slow convergencerate. The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelatedscore subsampling algorithm are established, and two optimal subsampling probabilities are derived under theA- and L-optimality criteria to downsize the data volume and reduce the computational burden. The proposedoptimal subsampling probabilities provably improve the asymptotic efficiency of the subsampling schemes in thelow-dimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs.A two-step algorithm is further proposed to implement, and the asymptotic properties of the correspondingestimators are also given. Simulations show satisfactory performance of the proposed estimators, and twoapplications to census income and Fashion-MNIST datasets also demonstrate its practical applicability.展开更多
Model selection strategies have been routinely employed to determine a model for data analysis in statistics, and further study and inference then often proceed as though the selected model were the true model that we...Model selection strategies have been routinely employed to determine a model for data analysis in statistics, and further study and inference then often proceed as though the selected model were the true model that were known a priori. Model averaging approaches, on the other hand, try to combine estimators for a set of candidate models. Specifically, instead of deciding which model is the 'right' one, a model averaging approach suggests to fit a set of candidate models and average over the estimators using data adaptive weights.In this paper we establish a general frequentist model averaging framework that does not set any restrictions on the set of candidate models. It broaden, the scope of the existing methodologies under the frequentist model averaging development. Assuming the data is from an unknown model, we derive the model averaging estimator and study its limiting distributions and related predictions while taking possible modeling biases into account.We propose a set of optimal weights to combine the individual estimators so that the expected mean squared error of the average estimator is minimized. Simulation studies are conducted to compare the performance of the estimator with that of the existing methods. The results show the benefits of the proposed approach over traditional model selection approaches as well as existing model averaging methods.展开更多
We consider the problem of variable selection for the fixed effects varying coefficient models. A variable selection procedure is developed using basis function approximations and group nonconcave penalized functions,...We consider the problem of variable selection for the fixed effects varying coefficient models. A variable selection procedure is developed using basis function approximations and group nonconcave penalized functions, and the fixed effects are removed using the proper weight matrices. The proposed procedure simultaneously removes the fixed individual effects, selects the significant variables and estimates the nonzero coefficient functions. With appropriate selection of the tuning parameters, an asymptotic theory for the resulting estimates is established under suitable conditions. Simulation studies are carried out to assess the performance of our proposed method, and a real data set is analyzed for further illustration.展开更多
The vagina contains at least a billion microbial cells,dominated by lactobacilli.Here we perform metagenomic shotgun sequencing on cervical and fecal samples from a cohort of 516 Chinese women of reproductive age,as w...The vagina contains at least a billion microbial cells,dominated by lactobacilli.Here we perform metagenomic shotgun sequencing on cervical and fecal samples from a cohort of 516 Chinese women of reproductive age,as well as cervical,fecal,and salivary samples from a second cohort of 632 women.Factors such as pregnancy history,delivery history,cesarean section,and breastfeeding were all more important than menstrual cycle in shaping the microbiome,and such information would be necessary before trying to interpret differences between vagino-cervical microbiome data.Greater proportion of Bifidobacterium breve was seen with older age at sexual debut.The relative abundance of lactobacilli especially Lactobacillus crispatus was negatively associated with pregnancy history.Potential markers for lack of menstrual regularity,heavy flow,dysmenorrhea,and contraceptives were also identified.Lactobacilli were rare during breastfeeding or post-menopause.Other features such as mood fluctuations and facial speckles could potentially be predicted from the vagino-cervical microbiome.Gut and salivary microbiomes,plasma vitamins,metals,amino acids,and hormones showed associations with the vagino-cervical microbiome.Our results offer an unprecedented glimpse into the microbiota of the female reproductive tract and call for international collaborations to better understand its long-term health impact other than in the settings of infection or pre-term birth.展开更多
The authors should be congratulated on their timely contribution to this emerging field with a compre-hensive review,which will certainly attract more researchers into this area.In the simplest one-shot approach,the e...The authors should be congratulated on their timely contribution to this emerging field with a compre-hensive review,which will certainly attract more researchers into this area.In the simplest one-shot approach,the entire dataset is distributed on multiple machines,and each machine computes a local estimate based on local data only,and a central machine per-forms an aggregation calculation as a final processing step.In more complicated settings,multiple communi-cations are carried out,typically passing also first-order information(gradient)and/or second-order informa-tion(Hession matrix)between local machines and the central machine.This review clearly separates the exist-ing works in this area into several sections,considering parameter regression,nonparametric regression,and other models including principal component analysis and variable screening.展开更多
基金This work was supported by the Fundamental Research Funds for the Central Universities,National Natural Science Foundation of China(Grant No.12271272)and the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin.
文摘In this paper, we consider the unified optimal subsampling estimation and inference on the lowdimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensionalgeneralized linear models (GLMs) with massive data. We first present a general subsampling decorrelated scorefunction to reduce the influence of the less accurate nuisance parameter estimation with the slow convergencerate. The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelatedscore subsampling algorithm are established, and two optimal subsampling probabilities are derived under theA- and L-optimality criteria to downsize the data volume and reduce the computational burden. The proposedoptimal subsampling probabilities provably improve the asymptotic efficiency of the subsampling schemes in thelow-dimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs.A two-step algorithm is further proposed to implement, and the asymptotic properties of the correspondingestimators are also given. Simulations show satisfactory performance of the proposed estimators, and twoapplications to census income and Fashion-MNIST datasets also demonstrate its practical applicability.
基金supported by National Science Foundation of USA (Grant Nos.DMS1812048,DMS-1737857,DMS-1513483 and DMS-1418042)National Natural Science Foundation of China (Grant No.11529101)
文摘Model selection strategies have been routinely employed to determine a model for data analysis in statistics, and further study and inference then often proceed as though the selected model were the true model that were known a priori. Model averaging approaches, on the other hand, try to combine estimators for a set of candidate models. Specifically, instead of deciding which model is the 'right' one, a model averaging approach suggests to fit a set of candidate models and average over the estimators using data adaptive weights.In this paper we establish a general frequentist model averaging framework that does not set any restrictions on the set of candidate models. It broaden, the scope of the existing methodologies under the frequentist model averaging development. Assuming the data is from an unknown model, we derive the model averaging estimator and study its limiting distributions and related predictions while taking possible modeling biases into account.We propose a set of optimal weights to combine the individual estimators so that the expected mean squared error of the average estimator is minimized. Simulation studies are conducted to compare the performance of the estimator with that of the existing methods. The results show the benefits of the proposed approach over traditional model selection approaches as well as existing model averaging methods.
基金Supported by National Natural Science Foundation of China(Grant Nos.11471029,11101014 and 11301279)the Beijing Natural Science Foundation(Grant No.1142002+3 种基金the Science and Technology Project of Beijing Municipal Education Commission(Grant No.KM201410005010)the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(Grant No.12KJB110016)CERG Grant from the Hong Kong Research Grants Council(Grant No.HKBU 202012)FRG Grant from Hong Kong Baptist University(Grant No.FRG2/12-13/077)
文摘We consider the problem of variable selection for the fixed effects varying coefficient models. A variable selection procedure is developed using basis function approximations and group nonconcave penalized functions, and the fixed effects are removed using the proper weight matrices. The proposed procedure simultaneously removes the fixed individual effects, selects the significant variables and estimates the nonzero coefficient functions. With appropriate selection of the tuning parameters, an asymptotic theory for the resulting estimates is established under suitable conditions. Simulation studies are carried out to assess the performance of our proposed method, and a real data set is analyzed for further illustration.
文摘The vagina contains at least a billion microbial cells,dominated by lactobacilli.Here we perform metagenomic shotgun sequencing on cervical and fecal samples from a cohort of 516 Chinese women of reproductive age,as well as cervical,fecal,and salivary samples from a second cohort of 632 women.Factors such as pregnancy history,delivery history,cesarean section,and breastfeeding were all more important than menstrual cycle in shaping the microbiome,and such information would be necessary before trying to interpret differences between vagino-cervical microbiome data.Greater proportion of Bifidobacterium breve was seen with older age at sexual debut.The relative abundance of lactobacilli especially Lactobacillus crispatus was negatively associated with pregnancy history.Potential markers for lack of menstrual regularity,heavy flow,dysmenorrhea,and contraceptives were also identified.Lactobacilli were rare during breastfeeding or post-menopause.Other features such as mood fluctuations and facial speckles could potentially be predicted from the vagino-cervical microbiome.Gut and salivary microbiomes,plasma vitamins,metals,amino acids,and hormones showed associations with the vagino-cervical microbiome.Our results offer an unprecedented glimpse into the microbiota of the female reproductive tract and call for international collaborations to better understand its long-term health impact other than in the settings of infection or pre-term birth.
文摘The authors should be congratulated on their timely contribution to this emerging field with a compre-hensive review,which will certainly attract more researchers into this area.In the simplest one-shot approach,the entire dataset is distributed on multiple machines,and each machine computes a local estimate based on local data only,and a central machine per-forms an aggregation calculation as a final processing step.In more complicated settings,multiple communi-cations are carried out,typically passing also first-order information(gradient)and/or second-order informa-tion(Hession matrix)between local machines and the central machine.This review clearly separates the exist-ing works in this area into several sections,considering parameter regression,nonparametric regression,and other models including principal component analysis and variable screening.