期刊文献+

基于迭代MapReduce的混合云大数据分析 被引量:2

Big data analysis based on iterative MapReduce for hybrid cloud
下载PDF
导出
摘要 现有混合云平台运行大数据分析等数据密集型工作负载存在数据迁移开销大、执行时间长等问题,提出基于迭代MapReduce的大数据分析方法。提出一种数据存储和数据迁移机制,迭代时将初始不变量从私有云迁移至公有云,无需修改MapReduce框架或底层存储层;利用随机森林估计所提迭代MapReduce应用程序所需的计算时间。OpenStack混合云实验结果表明,相比传统方案,提出方法仅增加了初始迭代的运行时间,最终完成时间比传统方法节约12.6%以上。此外,提出的性能预测方法的误差率可保持在19.54%以内。 Running data-intensive workloads such as large data analysis in a current hybrid cloud framework has the shortcomings such as high data migration and long execution time.A big-data analysis implementation method based on iterative MapReduce was proposed.A data storage and migration mechanism in which the initial invariants were migrated from the private cloud to the public cloud in the first iteration was proposed,which had the advantage of no modification of MapReduce framework or underl-ying storage layer.In addition,random forest(RF)algorithm was used to estimate the computational time required for the proposed iterative MapReduce application.A hybrid cloud example based on OpenStack shows that,compared with the traditional methods,the proposed method just increases the running time of initial iteration while shortens the final iterative time by more than 12.6%.In addition,the error rate of the proposed performance prediction method maintains within 19.54%.
作者 颜烨 张学文 王立婧 YAN Ye;ZHANG Xue-wen;WANG Li-jing(College of Electrical Information,City College of Science and Technology,Chongqing University,Chongqing 402167,China;School of Mechanical Engineering,Beihua University,Jilin 132021,China;College of Humanities,City College of Science and Technology,Chongqing University,Chongqing 402167,China)
出处 《计算机工程与设计》 北大核心 2021年第4期1028-1035,共8页 Computer Engineering and Design
基金 吉林省自然科学基金项目(20150101025JC) 高档数控机床科学与基础制造装备科技重大专项基金项目(2015ZX040003002) 2018年重庆市本科高校大数据智能化类特色专业建设基金项目(渝教高发[2018]12号)。
关键词 混合云 大数据分析 迭代MapReduce 数据迁移 随机森林算法 性能预测 hybrid cloud technology big-data analysis iterative MapReduce applications data migration random forest algorithm performance prediction
  • 相关文献

参考文献10

二级参考文献64

  • 1梅宏,申峻嵘.软件体系结构研究进展[J].软件学报,2006,17(6):1257-1275. 被引量:140
  • 2罗建光,张萌,赵黎,杨士强.基于P2P网络的大规模视频直播系统[J].软件学报,2007,18(2):391-399. 被引量:38
  • 3石绥祥,雷波.中国数字海洋-理论与实践[M].北京:海洋出版社,2011.
  • 4Gore A. The digital earth: Understanding our planet in the 21st eentury[OL]. 1998. [2011-12-02]. http://www2, has. edu/besr/238a, html.
  • 5Liu Xiansan, Zhang Xin, Chi Tianhe, et al. Study on China digital ocean prototype system [C] //Proc of the 2009 WRI World Congress on Software Engineering. Piscataway, NJ: IEEE, 2009:466-469.
  • 6Zhang Feng, Li Sihai, Shi Suixiang. Research of data architecture in digital ocean [OL]. 2010.[2013-01-20 ]. http://cartography-gis, com/pdf/ll _ Feng_ Zhang_ China paper, pdf.
  • 7Miller E, Gibson T. An improved long-term file usage prediction algorithm[OL]. 1999. [2013-02-26]. http:// users, soe. ucsc. edu/:elm/Papers/cmg99, pdf.
  • 8Jeong J, Dubois M. Cost-sensitive cache replacement algorithms [C] //Proc of the 9tb Int Syrup on High Performance Computer Architecture ( HPCA-9 ). Piscataway, NJ: IEEE, 2003:327-337.
  • 9Smitha, Reddy A. LRU-RED: An active queue management scheme to contain high band-width flows at congested routers [C] //Proc of the Global Telecommunications Conf. Piscataway, NJ: IEEE, 2001:2311-2315.
  • 10Modha D, Megiddo N. Outperforming LRU with an adaptive replacement cache algorithm [J]. Computer, 2004, 37 (4) : 58-65.

共引文献97

同被引文献16

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部