摘要
随着大数据时代的到来,分布式系统已广泛应用于生活中。然而,由于分布式系统中服务器数量不限,各种服务器之间的异质性较高,可能会对统计推断的结果产生影响。因此,在分布式系统中进行统计诊断变得非常必要。为此,提出了一种适用于分布式系统下分位数回归模型的异常值检测方法。考虑到实际应用背景,采用了群组(分布式系统中的子集)删除的方法来捕捉边际相关性的影响,并在较为稳健的模型中进行统计诊断。在蒙特卡罗模拟研究中,该方法表现出色,并通过对空气质量监测站点实际数据的检测进一步验证了其有效性。
With the arrival of the era of big data,distributed systems have been widely used in our lives.However,due to the unlimited number of servers in the distributed system,the heterogeneity between the various servers is high,which may affect the results of statistical inference.Therefore,statistical diagnosis in distributed systems becomes very necessary and important.For this reason,an outlier detection method which is suitable for quantile regression model in a distributed system is proposed.Considering the practical application background,the method of group(subset in distributed system)deletion is used to capture the impact of marginal correlation,and make statistical diagnosis in a more robust model.In the Monte Carlo simulation study,the method performs well,and its effectiveness is further verified by the detection of the actual data of the air quality monitoring station.
作者
陈实
姜荣
CHEN Shi;JIANG Rong(School of Mathematics and Statistics,Donghua University,Shanghai 201600,China;School of Mathematics,Physics and Statistics,Shanghai Polytechnic University,Shanghai 201209,China)
出处
《上海第二工业大学学报》
2024年第3期307-314,共8页
Journal of Shanghai Polytechnic University
关键词
统计诊断
分布式系统
分位数回归
群组删除
statistical diagnosis
distributed system
quantile regression
group deletion