低内存占用的分布式top-k监测算法

Memory-saving algorithm for distributed top-k monitoring

下载PDF

导出

摘要为提高大数据背景下面向数据流的分布式top-k监测的实时性和可用性,对监测多个数据流的分布式系统处理数据的过程进行研究,提出一种低内存占用的分布式top-k监测算法。通过使用有限的内存空间对原本杂乱分布于各节点的关键数据进行重新调整,对数据处理过程中可能遇到的各种情形进行分类,依照调整结果和分类结果指定相应的处理流程,使很大一部分数据更新操作可以不依靠网络通信,或仅依靠少量网络通信来完成,有效减少监测过程中的网络通信量,在保证监测实时性的前提下提高系统的可用性。实验结果表明,该算法是有效可行的。 To improve real-time performances and availability of distributed top-k monitoring over big data,a memory-saving algorithm was proposed based on the analysis of data processing procedure of distributed systems that monitored multiple data streams.Given limited memory,the distribution of critical data was adjusted which was chaotically distributed among all the nodes.All the possible circumstances during data processing were classified.With these results,appropriate methods were specified,which made it possible to process large part of data with limited or even no network transfer.Network traffic cost was reduced during monitoring and the availability was improved even in real-time monitoring.The proposed algorithm is demonstrated by experimental results.

作者冯大伟孙瑞志曹振丽

机构地区农业部农业信息获取技术重点实验室中国农业大学烟台研究院

出处《计算机工程与设计》北大核心 2015年第3期658-663,共6页 Computer Engineering and Design

基金国家科技支撑计划基金项目(2012BAK17B09 2012BAJ18B07)

关键词 TOP-K 在线监测低内存数据流分布式大数据 top-k online monitoring memory-saving data stream distributed big data

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献14

1Babcock B, Olston C. Distributed top-k monitoring [C] // Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 2003: 28-39.
2Yiu M L, Mamoulis N. Multi-dimensional top-k dominating queries [J]. The VLDB Journal, 2009, 18 (3): 695-718.
3Yang D, Shastri A, Rundensteiner E A, et al. An optimal strategy for monitoring top-k queries in streaming windows [C] //Proceedings of the 14th International Conference on Ex- tending Database Technology. ACM, 2011: 57-68.
4Kontaki M, Papadopoulos A N, Manolopoulos Y. Continuous processing of preference queries in data streams [G]. LNCS5901: Theory and Practice of Computer Science. Springer Ber- lin Heidelberg, 2010: 47-60.
5韩希先,杨东华,李建中.TKEP:海量数据上一种有效的Top-K查询处理算法[J].计算机学报,2010,33(8):1405-1417. 被引量：16
6Cormode G, Muthukrishnan S, Yi K. Algorithms for distribu- ted functional monitoring [J]. ACM Transactions on Algo- rithms, 2011, 7 (2): 21.
7Pang H H, Ding X, Zheng B. Efficient processing of exact top-k queries over disk-resident sorted lists [J]. The VLDB Journal, 2010, 19 (3): 437-456.
8Rocha-Junior J B, Vlaehou A, Doulkeridis C, et al. Efficient processing of top-k spatial preference queries [J]. Proceedings of the VLDB Endowment, 2010, 4 (2): 93-104.
9Haghani P, Michel S, Aberer K. Evaluating top-k queries over incomplete data streams [C] //Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM, 2009: 877-886.
10Jin C, Yi K, Chen L, et al. Sliding-window top-k queries on uncertain streams [J]. Proceedings of the VLDB Endow- ment, 2008, 1 (1): 301-312.

二级参考文献23

1李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量：113
2Korn Flip,Pagel Bernd-Uwe,Faloutsos Christos.On the ‘Dimensionality Curse' and the ‘Self-Similarity Blessing'.IEEE Transactions on Knowledge and Data Engineering,2001,13(1):96-111.
3Fagin Ronald,Lotem Amnon,Naor Moni.Optimal aggregation algorithms for middleware//Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems(PODS'01).California,USA,2001:102-113.
4Fagin Ronald,Lotem Amnon,Naor Moni.Optimal aggregation algorithms for middleware.Journal of Computer and System Sciences,2003,66(4):614-656.
5Mamoulis Nikos,Cheng Kit Hung,Yiu Man Lung,Cheung David W.Efficient aggregation of ranked inputs//Proceedings of the 22nd International Conference on Data Engineering(ICDE'06).Atlanta,GA,USA,2006:72-83.
6Mamoulis Nikos,Yiu Man Lung,Cheng Kit Hung,Cheung David W.Efficient top-k aggregation of ranked inputs.ACM Transactions on Database Systems(TODS),2007,32(3):19.
7Pang HweeHwa,Ding Xuhua,Zheng Baihua.Efficient processing of exact top-k queries over disk-resident sorted lists.VLDB Journal,2010,19(3):437-456.
8Fagin Ronald,Kumar Ravi,Sivakumar D.Efficient similarity search and classification via rank aggregation//Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD'03).San Diego,California,USA,2003:301-312.
9Bloom Burton H.Space/time trade-offs in Hash coding with allowable errors.Communications of the ACM,1970,13(7):422-426.
10Ilyas Ihab F,Beskales George,Soliman Mohamed A.A survey of top-k query processing techniques in relational database systems.ACM Computing Surveys,2008,40(4):11.

共引文献1074

1丁小军,陈杰,李霖,徐碧通,朱晓姝.一种基于聚类结果稳定性来确定聚类数的方法[J].玉林师范学院学报,2020(3):43-47. 被引量：1
2王玥,李文权,梁爽,余静财.基于改进聚类算法的共享汽车网点选址研究[J].武汉理工大学学报,2021,43(2):79-85.
3林耿堃,盛积良.乡村振兴时代背景下农民消费结构变迁研究[J].农业农村部管理干部学院学报,2021(2):76-81. 被引量：3
4高显义,林欣晖.基于文本聚类的变电工程变更特征识别研究[J].建筑经济,2020,41(S02):200-203. 被引量：2
5毛颖颖,杨新凯.融合拓扑势的自适应层次聚类算法研究[J].计算机应用研究,2020,37(S01):37-39.
6张睿恺,吴克河.基于优化特征集的LeNet-5攻击检测模型的态势感知技术[J].计算机应用研究,2020,37(S01):287-289. 被引量：3
7李对红,王裴岩 ,张桂平,张少阳.基于字簇的多模型中文分词方法研究[J].计算机应用研究,2020,37(2):355-359. 被引量：2
8尧少波,蒋励剑,赵文文,卢铮,吴昌聚,陈伟芳.耦合聚类的数据驱动稀薄流非线性本构计算方法[J].航空学报,2022,43(S02):43-56.
9段桂芹.基于改进密度的簇内均值最小距离聚类算法[J].智能计算机与应用,2021,11(12):82-86. 被引量：1
10何睿,余娜,李淼,张峻巍,王浩杰,赵玉茗.基于单细胞RNA测序数据的细胞类型聚类算法[J].智能计算机与应用,2020,10(7):104-108. 被引量：2

1Google第8次改版[J].IT时代周刊,2010(10):22-22.
2张选平,蒋宇,袁明轩,马琮,梁平.一种基于概念的信息检索查询扩展[J].微电子学与计算机,2006,23(4):110-114. 被引量：13
3许凤麟,梁青,周烽,郑阳,王永.双足机器人视觉系统的设计与应用[J].测控技术,2012,31(10):136-139. 被引量：1
4韩宇彬,薛贺.元搜索引擎结果集成算法[J].微处理机,2008,29(5):104-107. 被引量：3
5郑军,王巍,杨武,杨永田.基于类间距离参数估计的文本聚类评价方法[J].计算机工程,2009,35(9):37-39. 被引量：6
6朱志强,江紫亚,何玉庆,齐俊桐,韩建达.PID控制器的频域特性与无模型参数调节[J].控制与决策,2014,29(10):1833-1838. 被引量：12
7韩瑞华.590全数字直流调速装置在矫直机控制系统中的应用[J].昆明理工大学学报（理工版）,2005,30(z1):15-17. 被引量：1
8王继忠,童朝南,彭开香,肖磊.常化炉温度模糊控制系统的研究[J].电气传动,2006,36(9):23-25. 被引量：6
9分众传媒入选纳斯达克50指数[J].数字商业时代,2011(8):139-139.
10研扬PC/104 CPU模块换新名字啦![J].电子技术应用,2005,31(2):4-4.

计算机工程与设计

2015年第3期

浏览历史

内容加载中请稍等...

低内存占用的分布式top-k监测算法

参考文献14

二级参考文献23

共引文献1074

相关作者

相关机构

相关主题

浏览历史