期刊文献+

BP-AdaBoost分类算法的MapReduce并行化实现 被引量:1

MAPREDUCE PARALLEL IMPLEMENTATION OF BP-ADABOOST CLASSIFICATION ALGORITHM
下载PDF
导出
摘要 在面对海量数据分类问题时,时间和空间复杂性已成为传统算法的瓶颈。在对传统的BP-AdaBoost算法进行分析的基础上,结合云计算平台,给出传统BP-AdaBoost算法的MapReduce并行化方法。Map函数完成每个弱分类器预测误差εt的计算与重新标记,Reduce函数根据Map函数得到的中间结果合并计算出平均误差,供下一轮MapReduce计算任务使用。将改进后的算法部署在Hadoop集群上,能够实现高效并行的海量数据强分类。并通过集群上的三个对比实验,验证了该算法的可行性,它不仅能处理海量数据,而且降低了算法的时间复杂度,具有较好的加速比和准确性。 While dealing with massive data classification,the time and space complexities have become the bottleneck of traditional classification algorithms. Based on analysing traditional BP-AdaBoost algorithm,we propose a MapReduce parallel implementation method for traditional BP-AdaBoost algorithm in combination with cloud computing platform. The Map function completes the calculation and retagging of the forecasting deviation εtof every weak classifier,while the Reduce function calculates the average deviation in consolidation based on the middle results derived by Map function and which is for the use in next turn of MapReduce calculation work. Deploying the improved algorithm on Hadoop cluster,it is able to achieve efficient parallel strong classification of massive data. By three comparative experiments on Hadoop cluster,the feasibility of the algorithm is verified. It can deal with massive data,and can also reduce the time complexity,as well as has better linear speedup ratio and accuracy.
出处 《计算机应用与软件》 CSCD 北大核心 2014年第8期261-264,共4页 Computer Applications and Software
基金 国家自然科学基金项目(31271615)
关键词 云计算 BP-AdaBoost MAPREDUCE 海量数据 HADOOP集群 Cloud computing BP-AdaBoost MapReduce Massive data Hadoop cluster
  • 相关文献

参考文献11

二级参考文献76

  • 1席景科,闫大顺.Web数据挖掘中数据集成问题的研究[J].计算机工程与设计,2006,27(8):1366-1368. 被引量:6
  • 2刘志杰,季令,叶玉玲,耿志民.基于径向基神经网络的铁路货运量预测[J].铁道学报,2006,28(5):1-5. 被引量:45
  • 3何争光,孙晓峰,马勇光.AdaBoost-NN模型在浊漳河水质评价中的应用[J].郑州大学学报(工学版),2007,28(1):114-117. 被引量:1
  • 42009年世界风能报告[EB/OL].世界风能协会网站.
  • 5陈国良.并行计算:结构、算法、编程[M].北京:高等教育出版社,2004:88-94.
  • 6Grama A, Gupta A, Kumar V.Isoefficiency function: a scalability metric for parallel algorithms and architectures[J].IEEE Parallel & Distributed Technology, 1993,1 (3) : 12-21.
  • 7Sun X, Rover D.Scalability of parallel algorithm-machine combi-nations[J].IEEE Transactions on Parallel and Distributed System, 1994,5(6) : 599-613.
  • 8Cannataro M, Talia D, Trunfio P. KNOWLEDGE GRID.. High Performance Knowledge Discovery on the Grid [C] // Lecture Notes In Computer Science, Vol. 2242, Proceedings of the Second International Workshop on Grid Computing. 2001:38-50.
  • 9Ye Yan-bin, Chiang C-C. A Parallel Apriori Algorithm for Frequent Item sets Mining[C]//Proeeedings of the Fourth International Conference on Software Engineering Research Manage- ment and Applications(SERA'06). 2006:87-94.
  • 10Armbrust M, Fox A, Griffith R, et al. Above the Clouds: A Berkeley View of Cloud Computing.

共引文献175

同被引文献4

引证文献1

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部