基于MapReduce工作流优化器的研究

RESEARCH ON WORKFLOW OPTIMISERS BASED ON MAPREDUCE

下载PDF

导出

摘要对MapReduce工作流的优化主要是通过对MapReduce栈的优化实现的。针对MapReduce工作流的优化问题,首先,提出相关概念;其次,介绍MapReduce工作流基于成本的优化过程;然后,通过实例阐述MapReduce工作流中的数据流依赖和资源依赖关系。基于此,提出3种MapReduce工作流优化器,并对其进行端对端的评估。最后,通过实验评估工作流优化器的优化开销并对比分析了这3种工作流优化器。 The optimisation on MapReduce workflow is mainly achieved by optimising the MapReduce stack.For the problem of MapReduce workload optimisation,first we present the related concept;secondly,we introduce the cost-based optimisation process of MapReduce workflow;then we expound through examples the relation of dataflow dependency and resource dependency in MapReduce workflow.Based on this,we present three MapReduce workflow optimisers and carry out end-to-end evaluation on them.Finally,we evaluate the optimisation overheads of workflow optimisers through experiment and comparatively analyse all three of them.

作者袁开银

机构地区河南财经政法大学现代教育技术中心

出处《计算机应用与软件》 CSCD 2015年第10期54-58,85,共6页 Computer Applications and Software

关键词 MAPREDUCE 工作流优化数据流依赖资源依赖工作流优化器 MapReduce workloads Optimisation Dataflow dependency Resource dependency Workflow optimiser

分类号 TP311.5 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献13

1Pavlo A, Paulson E, Rasin A, et al. A Comparison of Approaches to Large-Scale Data Analysis[ C]//Proc of the 2012 ACM SIGMOD Intl Conf on Management of Data, 2009, Providence, Rhode Island, New York : ACM : 165 - 178.
2White T. Hadoop : The Definitive Guide [ M ]. California: Yahoo! ,2010.
3Tang H. Mumak : Map-Reduce Simulator [ EB/OL ]. Mumak : apache, 2009. ( 2009-9-25 ) [ 2011-11-26 ]. https ://issues. apache, org/jira/ browse/MAPREDUCE-728.
4Afrati F, Ullman J D. Optimizing Joins in a MapReduce Environment [ C]//Proc. of the 13th Intl. Conf. on Extending Database Technolo- gy,2009, University of Manchester, Oxford , United Kingdom, Peters- burg: ACM :99 - 110.
5Blanas S, Patel J M, Ercegovac V, et al. A Comparison of Join Algo- rithms for Log Processing in MapReduce [ C ]//Proc. Of the 2011 ACM SIGMOD Intl. Conf. on Management of Data,2011 ,Vancouver, Cana- da, Paris : ACM :975 - 986.
6Kwon Y, Balazinska M, Howe B, et al. Skew-Resistant Parallel Pro- cessing of Feature Extracting Scientific-User-Defined Functions [ C ]// Proc. of the 1st Symposium on Cloud Computing,2011, Hyatt Regen- cy, Indianapolis : ACM SOCC, 1 - 5.
7Nykiel T, Potamias M, Mishra C, et al. MRShare:Sharing Across Mul- tiple Queries in MapReduce [ J ]. PVLDB ,2011 (4) :494 - 505.
8Wu S, Li F, Mehrotra S, et al. Query Optimization for Massively Paral- lel Data Processing[ C ]//Proc, of the 2nd Symposium on Cloud Com- puting,2011, Cascals, Portugal,Victorla: ACM SOCC :338 - 356.
9Bent J, Denehy T E, Livny M, et al. Data-Driven Batch Scheduling [C]//Proe. of the 2nd Intl. Workshop on Data-Aware Distributed Computing,2009, sehia, Italy, New York : DIDC, 1-10.
10刘义,景宁,陈荦,熊伟.MapReduce框架下基于R-树的k-近邻连接算法[J].软件学报,2013,24(8):1836-1851. 被引量：60

二级参考文献10

1Bohm C, Krebs F. The k-nearest neighbor join: Turbo charging the KDD process. Knowledge Information System, 2004,6(6): 728-749. [doi: 10.1007/s10115-003-0122-9].
2Xia CY, Lu HJ, Coi BC, Hu J. Gorder: An efficient method for KDD joins processing. In: Proc. of the 30th Int'l Conf. on Very Large Data Bases (VLDB). 2004. 756-767.
3Yao B, Li FF, Kumar P. K nearest neighbor queries and KNN-joins in large relational databases (almost) for free. In: Proc. of the 26th Int'l Conf. on Data Engineering (ICDE). 2010.4-15. [doi: 10.1109/ICDE.2010.5447837].
4Yu C, Cui B, Wang SG, Su JW. Efficient index-based KNN join processing for high-dimensional data. Information and Software Technology, 2007,49(4):332-344. [doi: 10.1016/j.infsof.2006.05.006].
5Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008,51(1):107-113 [doi: 10.1145/1327452.1327492].
6White T. Hadoop: The Definitive Guide. Sebastopol: Yahoo! Press, 2009.
7Zhang C, Li FF, Jestes J. Efficient parallel kNN joins for large data in MapReduce. In: Proc. of the 15th Int'l Conf. on Extending Database Technology (EDBT). 2012.38-49. [doi: 10.1145/2247596.2247602].
8Lu W, Shen YY, Chen S, Col BC. Efficient processing of k nearest neighbor joins using MapReduce. In: Proc. of the 38th lnt'l Conf. on Very Large Data Bases (VLDB). 2012. 1016-1027.
9Liu Y, Jing N, Chen L, Chen HZ. Parallel bulk-loading of spatial data with MapReduce: An R4ree case. Wuhan University Journal of Natural Sciences, 2011,16(6):513-519. [doi: 10.1007/s11859-011-0790-3].
10Tao YF, Papadias D. Range aggregate processing in spatial databases. IEEE Trans. on Knowledge and Data Engineering, 2004, 16(12):1555-1570. [doi: 10.1109/TKDE.2004.93].

共引文献59

1刘琼,赵荣,孙立坚.Map/Reduce框架下的粗糙集空间数据挖掘改进算法[J].测绘科学,2014,39(5):49-53. 被引量：3
2代亮,许宏科,陈婷,钱超,梁殿鹏.基于MapReduce的多元线性回归预测模型[J].计算机应用,2014,34(7):1862-1866. 被引量：17
3陈晓康,刘竹松.基于改进Kd-Tree构建算法的k近邻查询[J].广东工业大学学报,2014,31(3):119-123. 被引量：8
4李玉丹,郑晓薇.Hadoop下多模式并行分类算法及其应用研究[J].计算机工程,2014,40(12):45-49. 被引量：2
5金菁.基于MapReduce模型的排序算法优化研究[J].计算机科学,2014,41(12):155-159. 被引量：6
6李贵兵,金炜东,蒋鹏,付小利,熊定鸿,谷鹏举.面向大规模监测数据的高铁故障诊断技术研究[J].系统仿真学报,2014,26(10):2458-2464. 被引量：10
7闫广,陈卿,刘晓文,郎佳敏.到时差计算中并行相关算法实验及性能分析[J].物联网技术,2015,5(2):52-55. 被引量：1
8蒋勇,赵作鹏.基于MapReduce模型的排序算法优化研究[J].计算机科学与探索,2015,9(4):410-417. 被引量：3
9李金海,何有世.在线评论信息挖掘分析的数据来源可靠性研究[J].软科学,2015,29(4):94-99. 被引量：6
10王飞,秦小麟,刘亮,沈尧.基于数据流的k-近邻连接算法[J].计算机科学,2015,42(5):204-210. 被引量：3

1冯秋燕.基于成本的MapReduce工作流优化器[J].计算机工程与应用,2015,51(21):64-69.
2王扶.双语法规则程序依赖性分析[J].现代电子技术,2011,34(4):38-41.
3包必显,曾庆凯.一种基于数据流依赖关系的可信恢复方法[J].计算机应用,2008,28(10):2467-2470. 被引量：1
4张帅,李海波.云制造环境中基于工作流的资源选取方法[J].计算机集成制造系统,2015,21(3):831-839. 被引量：12
5孙东明.电力系统自动化维护中的数据初探[J].黑龙江科技信息,2016(16):127-127.
6李佳,李海波.基于遗传算法的资源服务链构建方法[J].小型微型计算机系统,2016,37(9):1947-1952. 被引量：2
7巩垒,王海洋,黄富洁.基于对象和ECA规则的工作流模型[J].计算机应用,2003,23(10):48-50. 被引量：3
8陈宏君,刘克金.嵌入式装置图形化程序代码生成技术[J].电脑编程技巧与维护,2015(8):95-97. 被引量：4
9王骞,刘晓洁,梁刚.针对共享内存SPMD程序的通信流依赖分析方法[J].计算机应用,2010,30(3):596-599.
10李炜,曾广周,王晓琳.一种基于时间Petri网的工作流模型[J].软件学报,2002,13(8):1666-1671. 被引量：28

计算机应用与软件

2015年第10期

浏览历史

内容加载中请稍等...

基于MapReduce工作流优化器的研究

参考文献13

二级参考文献10

共引文献59

相关作者

相关机构

相关主题

浏览历史