期刊文献+

基于MapReduce模型的排序算法优化研究 被引量:3

Research on Optimization of Sorting Algorithm Based on MapReduce
下载PDF
导出
摘要 MapReduce已经发展成为大数据领域标准的并行计算模型。为了使MapReduce系统下参与计算的所有节点高度负载均衡,并且最小化空间使用率、CPU、I/O的使用时长和网络传输开销等指标,在保持算法良好并行性的基础上,提出了一种MapReduce优化算法的设计规范,对多个指标同时进行优化。针对数据处理领域最重要的排序算法进行理论分析,给出了多指标约束下的最优算法,并证明了该优化算法满足MapReduce优化算法规范。最后通过实验验证了该优化的排序算法在有效性和效率方面严格优于传统的排序算法。 MapReduce has become the standard parallel computing model on big data analysis. To balance highly the loading nodes in MapReduce system and minimize space usage, CPU, I/O operation time and network overhead, based on a good parallel algorithm, this paper proposes an optimization algorithm of MapReduce design specification, opti- mizing indexes at the same time. This paper also gives theoretical analysis for the most important sorting algorithm in data processing field, presents the optimal algorithm of multiple index constraints, and proves that the optimal algorithm meets the standard of MapReduce optimization algorithm. The experiments verify that this optimal sorting algorithm is better than the traditional sorting algorithm in terms of effectiveness and efficiency.
作者 蒋勇 赵作鹏
出处 《计算机科学与探索》 CSCD 北大核心 2015年第4期410-417,共8页 Journal of Frontiers of Computer Science and Technology
基金 江苏省自然科学基金Grant No.BK2012129~~
关键词 MAPREDUCE 优化算法 大数据 排序算法 MapReduce optimization algorithm big data sorting algorithm
  • 相关文献

参考文献3

二级参考文献34

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2倪巍伟,陆介平,孙志挥.基于向量内积不等式的分布式k均值聚类算法[J].计算机研究与发展,2005,42(9):1493-1497. 被引量:15
  • 3张冬冬,李建中,王伟平,郭龙江.数据流历史数据的存储与聚集查询处理算法[J].软件学报,2005,16(12):2089-2098. 被引量:17
  • 4Motwani R, Widom J, Arasu A, et al. Query processing, resource management, and approximation in a data stream management system [C] //Proc of the 1st Biennial Conf on Innovative Data Systems Research. New York: ACM, 2003: 176-187.
  • 5Abadi D J, Ahmad Y, Balazinska M, et al. The design of the Borealis stream processing engine [C] //Proe of the 2nd Biennial Conf on Innovative Data Systems Research. New York: ACM, 2005: 277-289.
  • 6Chandrasekaran S, Cooper O, Deshpande A, et al. TelegraphCQ: Continuous dataflow processing for an uncertain world [C] //Proc of the 1st Biennial Conf on Innovative Data Systems Research. New York: ACM, 2003: 200-211.
  • 7Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters [J]. Communications of ACM, 2008, 51(1): 107-113.
  • 8Ranger C, Raghuraman R, Penmetsa A, et al. Evaluating MapReduce for multi core and multiprocessor systems [C] // Proc of the 13th Int Conf on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2007:13-24.
  • 9Apache Hadoop [EB/OL]. [ 2012-02-14 ]. http://hadoop. apache, org/.
  • 10Chang F, Dean J, Ghemawat S, et al. Bigtable: A distributed storage system for structured data [C] //Proc of the 7th Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2006, 205-218.

共引文献124

同被引文献23

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部