摘要
随着互联网、物联网和云计算的高速发展,数据呈现"爆炸式"增长趋势,然而,各类信息的大量流通使我们无法获得完整的数据。如何快速、高效地处理缺失数据是我们面临的一大挑战。在大数据背景下,文章将数据分别存储在不同的子机器中,结合分布式优化方法,对协变量随机缺失的指示变量建立Logistic模型,并基于该模型提出一个替代似然函数来进行参数估计。模拟和实证研究结果表明,所提出的基于替代似然函数的分布式优化方法优于基于平均思想的OneShot方法。
With the rapid development of the Internet,the Internet of Things(IoT)and cloud computing,data shows an"explosive"growth trend.However,the massive flow of all kinds of information makes it impossible for us to obtain complete data.How to deal with the missing data quickly and efficiently is a big challenge we are faced with.In the context of big data,and combining with the distributed optimization method,the paper stores data in different sub-machines respectively to establish the Logistic model for the indicator variables with the covariates missing at random.And based on this model,the paper proposes a surrogate likelihood function to estimate parameters.Simulation and empirical results show that the distributed optimization method based on surrogate likelihood function is superior to the OneShot method based on average thought.
作者
潘莹丽
刘展
蔡雯
Pan Yingli;Liu Zhan;Cai Wen(School of Mathematics and Statistics,Hubei University,Wuhan 430062,China;Hubei Key Laboratory of Applied Mathematics,Hubei University,Wuhan 430062,China)
出处
《统计与决策》
CSSCI
北大核心
2020年第22期23-26,共4页
Statistics & Decision
基金
国家自然科学基金资助项目(11901175)。