In this paper,we consider the distributed inference for heterogeneous linear models with massive datasets.Noting that heterogeneity may exist not only in the expectations of the subpopulations,but also in their varian...In this paper,we consider the distributed inference for heterogeneous linear models with massive datasets.Noting that heterogeneity may exist not only in the expectations of the subpopulations,but also in their variances,we propose the heteroscedasticity-adaptive distributed aggregation(HADA)estimation,which is shown to be communication-efficient and asymptotically optimal,regardless of homoscedasticity or heteroscedasticity.Furthermore,a distributed test for parameter heterogeneity across subpopulations is constructed based on the HADA estimator.The finite-sample performance of the proposed methods is evaluated using simulation studies and the NYC flight data.展开更多
基金Supported by the National Science Foundation of China(Grant No.12271014)China Postdoctoral Science Foundation(Grant No.2022M720334)MOE(Ministry of Education in China)Project of Humanities and Social Sciences(Grant No.23YJCZH259)。
文摘In this paper,we consider the distributed inference for heterogeneous linear models with massive datasets.Noting that heterogeneity may exist not only in the expectations of the subpopulations,but also in their variances,we propose the heteroscedasticity-adaptive distributed aggregation(HADA)estimation,which is shown to be communication-efficient and asymptotically optimal,regardless of homoscedasticity or heteroscedasticity.Furthermore,a distributed test for parameter heterogeneity across subpopulations is constructed based on the HADA estimator.The finite-sample performance of the proposed methods is evaluated using simulation studies and the NYC flight data.