摘要
近年来,分布式系统中的数据流监测是一个十分活跃的领域。研究了如何实现通用并且高效的分布式top-k监测,即在分布的多数据流中根据用户给定的排序函数连续监测最大的k个值。在实际应用中,用户给定的排序函数可能是任意的排序函数,然而,目前的分布式top-k监测技术只支持加法作为排序函数。提出了一种通用的支持任意的连续的严格单调的聚集函数的分布式top-k监测算法GMR。GMR的通讯代价和k无关。通过真实世界数据和模拟数据验证了GMR的效率。实验表明,GMR的网络通讯量比同类方法低一个数量级以上。
Monitoring data streams in a distributed system is the focus of much research in recent years. This paper addresses the generic and efficient processing of distributed top-k monitoring, which is continuously reporting the k largest values according to a user-speclfied ranking function over distributed multiple data streams. In practice, the user-specified ranking function would be arbitrary ranking function. Unfortunately, state-of-art distributed top-k monitoring approaches only support the sum function as the ranking function. In this paper, we present a general algorithm GMR for distributed top-k monitoring, which supports arbitrary continuous and strict monotone aggregation functions. The communication cost of GMR is independent of k. We verify the effectiveness of GMR empirically using both real-world and synthetic data sets. We show that GMR reduces overall communication cost by an order of magnitude compared with alternatives.
出处
《计算机科学》
CSCD
北大核心
2007年第2期125-128,共4页
Computer Science
基金
国家"九七三"重点基础研究发展规划基金项目(2005CB321804)
国家"八六三"高技术研究发展计划基金项目(2004A112020)
国家"八六三"高技术研究发展计划基金项目(2005AA112030)