摘要
随着大数据时代的到来,分布式存储系统被广泛应用,这使得数据的分析面临较大的挑战。本文主要基于文[1]提出的两步子抽样算法思想,提出分布式两步子抽样算法,利用该算法得到的参数估计量具有一致性和渐近正态性。采用数值模拟及真实数据预测,进一步对算法进行评估,结果表明,分布式两步子抽样算法与简单随机抽样算法相比精度更高,与全样本相比,在保证精度损失很小的基础上,节约了CPU运行时间,提高了算法效率。
With the advent of the era of big data,distributed storage systems are widely used,which brings greater challenges for data analysis.Based on the idea of two-step algorithm proposed by Wang et al.(2018),the distributed two-step subsampling algorithm was proposed in this paper.The asymptotic normality and convergence rate of the estimators are presented.Evaluate the performance of the distributed two-step subsampling algorithm by using numerical simulations and real datasets.The results show that the distributed two-step subsampling algorithm always has higher accuracy compared to simple random sampling algorithm.Compared to the full data approach it takes significantly less computing time on the basis of high accuracy.
作者
李莉莉
杜梅慧
张璇
LI Li-li;DU Mei-hui;ZHANG Xuan(School of Economics,Qingdao University,Qingdao 266100,China;School of Economics,Nankai University,Tianjin 300071,China;China National of Standardization,Beijing 100088,China)
出处
《数理统计与管理》
CSSCI
北大核心
2022年第5期858-866,共9页
Journal of Applied Statistics and Management
基金
国家社科基金项目(2019BTJ028)。