期刊文献+

基于logistic回归模型的大数据分布式两步子抽样算法 被引量:4

Distributed Two-step Subsampling Algorithm for Logistic Regression Model
原文传递
导出
摘要 随着大数据时代的到来,分布式存储系统被广泛应用,这使得数据的分析面临较大的挑战。本文主要基于文[1]提出的两步子抽样算法思想,提出分布式两步子抽样算法,利用该算法得到的参数估计量具有一致性和渐近正态性。采用数值模拟及真实数据预测,进一步对算法进行评估,结果表明,分布式两步子抽样算法与简单随机抽样算法相比精度更高,与全样本相比,在保证精度损失很小的基础上,节约了CPU运行时间,提高了算法效率。 With the advent of the era of big data,distributed storage systems are widely used,which brings greater challenges for data analysis.Based on the idea of two-step algorithm proposed by Wang et al.(2018),the distributed two-step subsampling algorithm was proposed in this paper.The asymptotic normality and convergence rate of the estimators are presented.Evaluate the performance of the distributed two-step subsampling algorithm by using numerical simulations and real datasets.The results show that the distributed two-step subsampling algorithm always has higher accuracy compared to simple random sampling algorithm.Compared to the full data approach it takes significantly less computing time on the basis of high accuracy.
作者 李莉莉 杜梅慧 张璇 LI Li-li;DU Mei-hui;ZHANG Xuan(School of Economics,Qingdao University,Qingdao 266100,China;School of Economics,Nankai University,Tianjin 300071,China;China National of Standardization,Beijing 100088,China)
出处 《数理统计与管理》 CSSCI 北大核心 2022年第5期858-866,共9页 Journal of Applied Statistics and Management
基金 国家社科基金项目(2019BTJ028)。
关键词 大数据 分布式存储 两步子抽样算法 LOGISTIC回归模型 big data distributed storage two-step subsampling algorithm logistic regression model
  • 相关文献

参考文献9

二级参考文献116

  • 1马长兴.均匀性的一个新度量准则一对称偏差[J].南开大学学报(自然科学版),1997,30(1):31-37. 被引量:11
  • 2Yunus M. Building Social Business: The New Kind of Capitalism That Serves Humanity's Most Pressing Needs. Philadelphia: Public Affairs, 2011.2-17.
  • 3Leung L. Generational differences in content generation in social media: The roles of the gratifications sought and of narcissism. Computers in Human Behavior, 2013,29(3):997-1006. [doi: 10.1016/j.chb.2012.12.028].
  • 4Becchetti L, Castillo C, Donato D, Fazzone A. A comparison of sampling techniques for Web graph characterization. In: Proc. of the Workshop on Link Analysis (LinkKDD 2006). New York: ACM Press, 2006. http://ailab.ijs.si/dunja/linkkdd2006/Papers/ becchetti.pdf [doi: 10.1.1.69.1736].
  • 5Leskovec J, Faloutsos C. Sampling from large graphs. In: Proc. of the 12th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2006.631-636. [doi: 10.1145/1150402.1150479].
  • 6Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and analysis of online social networks. In: Proc. of the 7th ACM SIGCOMM Conf. on Internet Measurement. New York: ACM Press, 2007. 29-42. [doi: 10.1145/1298306. 1298311].
  • 7Amanda LT, Peter JM, Mason AP. Social structure of Facebook networks. Physica A, 2012,391:4165-4180. [doi: 10,1016/j.physa. 2011.12.021.
  • 8Ferrara E. A large-scale community structure analysis in Facebook. EPJ Data Science, 2012,1 (1): 1-30. [doi: 10.1140/epjds 1 ].
  • 9Ahmed N, Neville J, Kompella R. Network sampling via edge-based node selection with graph induction. Computer Science Technical Reports, 11-016, 2011.1-10.
  • 10Gjoka M, Kurant M, Butts CT, Markopoulou A. Practical recommendations on crawling online social networks. IEEE Journal on Selected Areas in Communications, 2011,29(9): 1872-1892. [doi: 10.1109/JSAC.2011.111011].

共引文献89

同被引文献24

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部