一种用于大数据分析服务的启发式云爆发算法被引量：1

A HEURISTIC CLOUD BURSTING ALGORITHM FOR BIG-DATA ANALYTICS SERVICES

下载PDF

导出

摘要基于异构云联合的并行化大数据分析服务可以提升性能。然而由于大数据网络传输存在较大时延，原则上必须在并行化水平和大数据分析性能之间进行折衷。鉴于此，提出一种启发式云爆发算法用于并行化大数据分析服务。首先确定联合云中哪些计算结点应该用于大数据分析并行处理，然后将大数据妥善地分配给这些计算结点，确保处理同步完成且性能最优，最后，确定被分配的不同大小数据块在各个结点的计算次序，确保数据块传输尽量在结点上一数据块计算期间完成。与其他负载均衡算法做了对比，结果表明，使用该算法后性能可提升20％～60％。 Parallelisation big-data analytics services over a federation of heterogeneous clouds are considered to improve the performance. However, principally there is an inherent trade-off between the level of parallelisation and the performance of big-data analytics because a quite significant delay exists when the big-data is transmitted over the network. In view of this, we propose a heuristic cloud bursting algorithm and apply it to parallelisation big-data analytics services. First, the algorithm determines which computing nodes in federated clouds should be used for parallel processing of the big-data analytics ; then it appropriately allocates the big-data to these computing nodes for ensuring the completion of the synchronised processing with best performance; finally, it determines the computation sequence of the allocated big-data chunks with different sizes in each node, so as to guarantee the transmission of a data chunk is to be completed within the computation period of its previous chunk in the node as much as possible. We have compared our algorithm with other load-balancing schemes. Result shows that by using this algorithm the performance can be improved by 20% and up to 60% against other approaches.

作者史建政于东敏

机构地区廊坊职业技术学院河北工业大学廊坊分校

出处《计算机应用与软件》 CSCD 2015年第2期249-254,260,共7页 Computer Applications and Software

基金河北省教育厅教学改革立项支持项目(103004) 教育部高职委项目(jzw590111050)

关键词联合云大数据分析并行处理云爆发负载均衡 Federated clouds Big-data analytics Parallel processing Cloud bursting Load balancing

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献15

1Howe D,Costanzo M,Fey P, et al. Big data: The future of biocuration [ J ]. Nature ,2008,455 (7209) :47 - 50.
2Rozsnyai S, Slominski A, Doganata Y. Large-scale distributed storage system for business provenance [ C ]//Cloud Computing ( CLOUD ) , 2011 IEEE International Conference on. IEEE,2011:516 -524.
3Ayres J, Flannick J, Gehrke J, et al. Sequential pattern mining using a bitmap representation [ C ]//Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM ,2002:429 - 435.
4Borthakur D. The hadoop distributed file system:Architecture and de- sign [ J ]. Hadoop Project Website,2007,11:21.
5Mukherjee T, Banerjee A, Varsamopoulos G,et al. Spatio-temporal ther- mal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers [ J ]. Computer Networks, 2009,53 ( 17 ) : 2888 - 2904.
6Fan P, Wang J,Zheng Z, et al. Toward optimal deployment of communi- cation-intensive cloud applications [ C ]//Cloud Computing (CLOUD), 2011 IEEE International Conference on. IEEE ,2011:460 - 467.
7Miyoshi T, Kise K, Irie H, et al. CODIE: Continuation-Based Overlap- ping Data-Transfers with Instruction Execution [ C ]//Networking and Computing (ICNC), 2010 First International Conference on. IEEE, 2010:71 -77.
8Kim H, Parashar M. CometCloud: An Autonomic Cloud Engine [ J ]. Cloud Computing: Principles and Paradigms,2011:275 - 297.
9Maheswaran M, Ali S, Siegal H J, et al. Dynamic matching and schedu- ling of a class of independent tasks onto heterogeneous computing sys- tems [ C ]//Heterogeneous Computing Workshop, 1999. ( HCW 99 ) Proceedings. Eighth. IEEE, 1999:30 - 44.
10Kailasam S, Gnanasambandam N, Dharanipragada J, et al. Optimizing service level agreements for autonomic cloud bursting schedulers [ C ]// Parallel Processing Workshops (ICPPW), 2010 39th International Conference on. IEEE ,2010:285 - 294.

同被引文献5

1王振玺,乐嘉锦,王梅,刘国华.列存储数据区级压缩模式与压缩策略选择方法[J].计算机学报,2010,33(8):1523-1530. 被引量：15
2申彦,宋顺林,朱玉全.基于磁盘表存储FP-TREE的关联规则挖掘算法[J].计算机研究与发展,2012,49(6):1313-1322. 被引量：14
3张宇,程久军.基于MapReduce的矩阵分解推荐算法研究[J].计算机科学,2013,40(1):19-21. 被引量：8
4林长方,吴扬扬,黄仲开,曾少俊.基于MapReduce的Apriori算法并行化[J].江南大学学报（自然科学版）,2014,13(4):411-415. 被引量：13
5金菁.基于MapReduce模型的排序算法优化研究[J].计算机科学,2014,41(12):155-159. 被引量：6

引证文献1

1谢志明,王鹏.基于MapReduce架构的并行矩阵Apriori算法[J].计算机应用研究,2017,34(2):401-404. 被引量：23

二级引证文献23

1李强,吴裕雄,古国照,陈锡林,陈晔.智能辅助诊疗平台设计与探索[J].医学信息学杂志,2019,40(11):32-35. 被引量：3
2黄东,陈光,李海滨,杨朔.Spark个性化地点推荐系统[J].辽宁工程技术大学学报（自然科学版）,2020(6):533-540. 被引量：1
3李融,杨淙钧,高泽,李常宝,刘忠麟,艾中良.基于Spark的精准关联规则挖掘算法实现[J].信息技术,2018,42(2):153-158. 被引量：4
4令宝.基于数据挖掘的运动员神经类型特征评估系统构建[J].自动化与仪器仪表,2018,0(10):165-168. 被引量：2
5肖文,胡娟,周晓峰.基于MapReduce计算模型的并行关联规则挖掘算法研究综述[J].计算机应用研究,2018,35(1):13-23. 被引量：47
6邵梁,何星舟,尚俊娜.基于Spark框架的FP-Growth大数据频繁项集挖掘算法[J].计算机应用研究,2018,35(10):2932-2935. 被引量：12
7梁瑷云,袁丁,严清,刘小久.Spark平台下关联规则算法的优化实现[J].计算机工程与设计,2018,39(12):3692-3699. 被引量：4
8李强,陈东涛,罗先录.关联规则算法在医疗大数据中的应用探索[J].软件工程,2019,22(1):12-15. 被引量：5
9刘玮,邹璐琨,霸元婕,李广力,张志刚.基于凸函数证据理论的关联感知云服务信任模型[J].计算机工程与科学,2019,41(1):47-55. 被引量：2
10郑静益,邓晓衡.基于项编码的分布式频繁项集挖掘算法[J].计算机应用研究,2019,36(4):1059-1063. 被引量：4

1刘万军,王晓宇,曲海成,孟煜.基于改进蚁群算法的服务器集群资源调度研究[J].微电子学与计算机,2016,33(3):98-101. 被引量：8
2耿新民,王少峰,许飞.基于VMware的高可用性集群在电力信息系统中的应用[J].上海电力学院学报,2010,26(2):193-196. 被引量：9
3TANG Shengyong,ZHANG Shijie,ZHANG Yulin.A Modified Direct Allocation Algorithm with Application to Redundant Actuators[J].Chinese Journal of Aeronautics,2011,24(3):299-308. 被引量：6
4段赵磊,黄艳.弹性Web Cache集群的自适应负载均衡策略[J].小型微型计算机系统,2013,34(7):1527-1530. 被引量：1
5姜戬,刘敏,徐刚,马建,张冬梅.基于优先级的混合自动请求重传方法[J].计算机工程,2007,33(24):4-6. 被引量：1
6王群.VME总线原理及应用(三)[J].微计算机信息,1994,10(1):27-28.
7崔雪芝,王东燕.iSCSI:网络存储的未来[J].计算机科学,2005,32(2):48-49. 被引量：3
8赵备,余锋,胡璇,舒宇,汪乐宇.基于FPGA的光纤通道协议引擎的设计与实现[J].浙江大学学报（工学版）,2009,43(9):1604-1608. 被引量：4
9TI DRV8841为打印机提供集成式电机驱动器解决方案[J].世界电子元器件,2015(9).
10丁英强,杜留锋,杨挺,孙雨耕.Distributed Localization Algorithm for Wireless Sensor Network Based on Multidimensional Scaling and the Shortest Path Distance Correction[J].Transactions of Tianjin University,2009,15(4):237-244. 被引量：2

计算机应用与软件

2015年第2期

浏览历史

内容加载中请稍等...

一种用于大数据分析服务的启发式云爆发算法被引量：1

参考文献15

同被引文献5

引证文献1

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

一种用于大数据分析服务的启发式云爆发算法 被引量：1

参考文献15

同被引文献5

引证文献1

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

一种用于大数据分析服务的启发式云爆发算法被引量：1