摘要
为提高大数据背景下面向数据流的分布式top-k监测的实时性和可用性,对监测多个数据流的分布式系统处理数据的过程进行研究,提出一种低内存占用的分布式top-k监测算法。通过使用有限的内存空间对原本杂乱分布于各节点的关键数据进行重新调整,对数据处理过程中可能遇到的各种情形进行分类,依照调整结果和分类结果指定相应的处理流程,使很大一部分数据更新操作可以不依靠网络通信,或仅依靠少量网络通信来完成,有效减少监测过程中的网络通信量,在保证监测实时性的前提下提高系统的可用性。实验结果表明,该算法是有效可行的。
To improve real-time performances and availability of distributed top-k monitoring over big data,a memory-saving algorithm was proposed based on the analysis of data processing procedure of distributed systems that monitored multiple data streams.Given limited memory,the distribution of critical data was adjusted which was chaotically distributed among all the nodes.All the possible circumstances during data processing were classified.With these results,appropriate methods were specified,which made it possible to process large part of data with limited or even no network transfer.Network traffic cost was reduced during monitoring and the availability was improved even in real-time monitoring.The proposed algorithm is demonstrated by experimental results.
出处
《计算机工程与设计》
北大核心
2015年第3期658-663,共6页
Computer Engineering and Design
基金
国家科技支撑计划基金项目(2012BAK17B09
2012BAJ18B07)