期刊文献+

异质性大数据的分布式估计 被引量:2

Distributed Estimation for Heterogeneous Big Data
下载PDF
导出
摘要 随着物联网技术的进步,大数据给网络带宽和计算机存储能力带来巨大挑战,传统的集中式数据处理难以实现,客观上促进了分布式统计学习的发展。在无迭代算法研究中,Zhang等(2013)证明了当数据集个数s=O(■)时,基于局部经验风险最小化的分治(DC)简单平均估计量具有O(N-1)均方误差收敛速度,Huang和Huo(2019)在M估计框架下进一步提出分布式一步估计量,但上述方法均未考虑海量数据可能存在的异质性对分治估计效果的影响。本文在线性模型框架下提出海量异质数据的分治一步加权估计,证明了估计量的渐近性质并考虑了异质性检验问题。将本文提出的方法应用于美国医疗保险实际数据分析,结果表明该方法能更好地拟合数据的线性趋势且显著提高了计算效率。 With the rapid development of Io T technology,big data brings great challenge to network bandwidth and computer storage capacity,which makes traditional centralized data processing difficult to achieve. Distributed computing came into being in this background. The idea of distributed computing,known in statistics as divide-and-conquer( DC),is attracting more and more attention from statisticians. Zhang et al.(2013) demonstrated the simple average of local empirical risk minimization estimation has mean square error rate O(N-1) when the number of data sets s = O(■). On this basis,Huang and Huo(2019) proposed a distributed one-step estimator of M-estimation with Newton-Raphson iteration. However,the above methods do not consider the effect of heterogeneity in big data on estimation results. In this paper,a distributed one-step weight estimation for heterogeneous big data is proposed in the framework of linear model and its asymptotic properties are proved and used to test heterogeneity in big data. In addition,the proposed method is applied to the actual data analysis of medical insurance in the United States. The results show that compared with the simple average estimation,the method presented in this paper can better fit the linear trend of data and significantly improve the computational efficiency.
作者 郭婧璇 徐慧超 祝婉晴 田茂再 Guo Jingxuan;Xu Huichao;Zhu Wanqing;Tian Maozai
出处 《统计研究》 CSSCI 北大核心 2020年第10期104-114,共11页 Statistical Research
基金 中国人民大学科学研究基金(中央高校基本科研业务费专项资金资助)项目“大数据分析的稳健统计理论与应用研究”(18XNL012)。
关键词 分治策略 一步估计 海量数据 异质性 医疗保险 Divide-and-conquer One-step Estimator Big Data Heterogeneity Medical Insurance
  • 相关文献

参考文献4

二级参考文献47

  • 1N Hjort and G Claeskens. Frequentist model average estimators [ J ]. Journal of the American Statistical Association,2003 (4) : 879 -899.
  • 2Z Yuan and Y Yang. Combining linear regression models:When and how [ J]. Journal of the American Statistical Association,2005 (4) : 1202 - 1214.
  • 3G Leung and A Barron. Information theory and mixing least-squares regressions [ J 1. IEEE Transactions on Information Theory, 2006 (8) :3396 -3410.
  • 4J Bates and C Granger. The combination of forecasts [ J ], Operations Research Quarterly, 1969 (4) :451 - 468.
  • 5I A Wan, X Zhang and G Zou. Least squares model averaging by Mallows criterion [ J ]. Journal of Econmnetrics, 2010 ( 2 ) : 277 -283.
  • 6N Longford. Editorial: Model selection and efficiency--is ' Which model... 7' the right question [ J], Journal of the Royat Statistical Society A ,2005 ( 3 ) :469 - 472.
  • 7C Min and A Zellner. Bayesian and non-Bayesian methods for combining models and forecasts with applications to torecasting international growth rates [ J ]. Journal of Econometrics, 1993 ( l ) : 89 - 118.
  • 8D Draper. Assessment and propagation of model uncertainty i J I Journal of the Royal Statistical Society B, 1995 (1) :45 -70.
  • 9S Buckland, K Burnham and N Augustin. Model selection: An integral part of inference [ J ~, Biometrics, ! 997 ( 2 ) : 603 - 6 t 8.
  • 10G Kapetanios,V Labhard and S Price. Forecasting using predictive likelihood model averaging [ .} ]. Economics Letters, 2006 ( 3 ) : 373 -579.

共引文献40

同被引文献80

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部