期刊文献+

基于MapReduce工作流优化器的研究

RESEARCH ON WORKFLOW OPTIMISERS BASED ON MAPREDUCE
下载PDF
导出
摘要 对MapReduce工作流的优化主要是通过对MapReduce栈的优化实现的。针对MapReduce工作流的优化问题,首先,提出相关概念;其次,介绍MapReduce工作流基于成本的优化过程;然后,通过实例阐述MapReduce工作流中的数据流依赖和资源依赖关系。基于此,提出3种MapReduce工作流优化器,并对其进行端对端的评估。最后,通过实验评估工作流优化器的优化开销并对比分析了这3种工作流优化器。 The optimisation on MapReduce workflow is mainly achieved by optimising the MapReduce stack.For the problem of MapReduce workload optimisation,first we present the related concept;secondly,we introduce the cost-based optimisation process of MapReduce workflow;then we expound through examples the relation of dataflow dependency and resource dependency in MapReduce workflow.Based on this,we present three MapReduce workflow optimisers and carry out end-to-end evaluation on them.Finally,we evaluate the optimisation overheads of workflow optimisers through experiment and comparatively analyse all three of them.
作者 袁开银
出处 《计算机应用与软件》 CSCD 2015年第10期54-58,85,共6页 Computer Applications and Software
关键词 MAPREDUCE 工作流 优化 数据流依赖 资源依赖 工作流优化器 MapReduce workloads Optimisation Dataflow dependency Resource dependency Workflow optimiser
  • 相关文献

参考文献13

  • 1Pavlo A, Paulson E, Rasin A, et al. A Comparison of Approaches to Large-Scale Data Analysis[ C]//Proc of the 2012 ACM SIGMOD Intl Conf on Management of Data, 2009, Providence, Rhode Island, New York : ACM : 165 - 178.
  • 2White T. Hadoop : The Definitive Guide [ M ]. California: Yahoo! ,2010.
  • 3Tang H. Mumak : Map-Reduce Simulator [ EB/OL ]. Mumak : apache, 2009. ( 2009-9-25 ) [ 2011-11-26 ]. https ://issues. apache, org/jira/ browse/MAPREDUCE-728.
  • 4Afrati F, Ullman J D. Optimizing Joins in a MapReduce Environment [ C]//Proc. of the 13th Intl. Conf. on Extending Database Technolo- gy,2009, University of Manchester, Oxford , United Kingdom, Peters- burg: ACM :99 - 110.
  • 5Blanas S, Patel J M, Ercegovac V, et al. A Comparison of Join Algo- rithms for Log Processing in MapReduce [ C ]//Proc. Of the 2011 ACM SIGMOD Intl. Conf. on Management of Data,2011 ,Vancouver, Cana- da, Paris : ACM :975 - 986.
  • 6Kwon Y, Balazinska M, Howe B, et al. Skew-Resistant Parallel Pro- cessing of Feature Extracting Scientific-User-Defined Functions [ C ]// Proc. of the 1st Symposium on Cloud Computing,2011, Hyatt Regen- cy, Indianapolis : ACM SOCC, 1 - 5.
  • 7Nykiel T, Potamias M, Mishra C, et al. MRShare:Sharing Across Mul- tiple Queries in MapReduce [ J ]. PVLDB ,2011 (4) :494 - 505.
  • 8Wu S, Li F, Mehrotra S, et al. Query Optimization for Massively Paral- lel Data Processing[ C ]//Proc, of the 2nd Symposium on Cloud Com- puting,2011, Cascals, Portugal,Victorla: ACM SOCC :338 - 356.
  • 9Bent J, Denehy T E, Livny M, et al. Data-Driven Batch Scheduling [C]//Proe. of the 2nd Intl. Workshop on Data-Aware Distributed Computing,2009, sehia, Italy, New York : DIDC, 1-10.
  • 10刘义,景宁,陈荦,熊伟.MapReduce框架下基于R-树的k-近邻连接算法[J].软件学报,2013,24(8):1836-1851. 被引量:60

二级参考文献10

  • 1Bohm C, Krebs F. The k-nearest neighbor join: Turbo charging the KDD process. Knowledge Information System, 2004,6(6): 728-749. [doi: 10.1007/s10115-003-0122-9].
  • 2Xia CY, Lu HJ, Coi BC, Hu J. Gorder: An efficient method for KDD joins processing. In: Proc. of the 30th Int'l Conf. on Very Large Data Bases (VLDB). 2004. 756-767.
  • 3Yao B, Li FF, Kumar P. K nearest neighbor queries and KNN-joins in large relational databases (almost) for free. In: Proc. of the 26th Int'l Conf. on Data Engineering (ICDE). 2010.4-15. [doi: 10.1109/ICDE.2010.5447837].
  • 4Yu C, Cui B, Wang SG, Su JW. Efficient index-based KNN join processing for high-dimensional data. Information and Software Technology, 2007,49(4):332-344. [doi: 10.1016/j.infsof.2006.05.006].
  • 5Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008,51(1):107-113 [doi: 10.1145/1327452.1327492].
  • 6White T. Hadoop: The Definitive Guide. Sebastopol: Yahoo! Press, 2009.
  • 7Zhang C, Li FF, Jestes J. Efficient parallel kNN joins for large data in MapReduce. In: Proc. of the 15th Int'l Conf. on Extending Database Technology (EDBT). 2012.38-49. [doi: 10.1145/2247596.2247602].
  • 8Lu W, Shen YY, Chen S, Col BC. Efficient processing of k nearest neighbor joins using MapReduce. In: Proc. of the 38th lnt'l Conf. on Very Large Data Bases (VLDB). 2012. 1016-1027.
  • 9Liu Y, Jing N, Chen L, Chen HZ. Parallel bulk-loading of spatial data with MapReduce: An R4ree case. Wuhan University Journal of Natural Sciences, 2011,16(6):513-519. [doi: 10.1007/s11859-011-0790-3].
  • 10Tao YF, Papadias D. Range aggregate processing in spatial databases. IEEE Trans. on Knowledge and Data Engineering, 2004, 16(12):1555-1570. [doi: 10.1109/TKDE.2004.93].

共引文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部