期刊文献+

KNN分类算法的MapReduce并行化实现 被引量:21

Parallel Implementing KNN Classification Algorithm Using MapReduce Programming Mode
下载PDF
导出
摘要 为了提高k-nearest neighbor algorithm(KNN)算法处理大数据集的能力,本文利用Map Reduce并行编程模型,同时结合KNN算法自身的特点,给出了KNN算法在Hadoop平台下的并行化实现。通过设计Map、Combine和Reduce 3个函数,实现了KNN算法的并行化。Map函数完成每个测试样本与训练样本之间的相似度计算,Combine函数作为一个本地的Reduce操作,用以减少中间计算量及通信开销,Reduce函数则根据上述函数得到的中间结果计算出k近邻并作出分类判断。实验结果表明:较之以往的单机版方法,在Hadoop集群上实现的并行化KNN算法具有较好的加速比和良好的扩展性。 In order to improve the ability of KNN algorithm to process massive data, a new technique based on Hadoop platform is used. Considering the characteristics of the KNN algorithm itself, the par allelism of KNN based on the MapReduce programming model is implemented. Three functions are de signed for the implementation of the parallelism, named Map, Combine and Reduce. The Similarity be tween each test instances and the training dataset are evaluated by Map function. For reducing the com putational complexity and saving network bandwidth, the Combine function is used as a local Reduce op eration. Reduce function is used to get the KNN classification based on the intermediate results. The ex periment on the Hadoop platform shows the method has excellent linear speedup with an increasing number of computer nodes and good scalability.
出处 《南京航空航天大学学报》 EI CAS CSCD 北大核心 2013年第4期550-555,共6页 Journal of Nanjing University of Aeronautics & Astronautics
基金 国家自然科学基金(61173143)资助项目 江苏省自然科学基金(BK2010380)资助项目 中国博士后科学基金(2012M511303)资助项目 江苏省高校优势学科建设工程资助项目
关键词 KNN分类 并行计算 MAPREDUCE模型 HADOOP KNN classification parallel computing MapReduce programming model Hadoop
  • 相关文献

参考文献5

二级参考文献111

  • 1吴恩华,柳有权.基于图形处理器(GPU)的通用计算[J].计算机辅助设计与图形学学报,2004,16(5):601-612. 被引量:225
  • 2刘华元,袁琴琴,王保保.并行数据挖掘算法综述[J].电子科技,2006,19(1):65-68. 被引量:15
  • 3蒋晶珏,张祖勋,明英.复杂城市环境的机载Lidar点云滤波[J].武汉大学学报(信息科学版),2007,32(5):402-405. 被引量:38
  • 4Sims K. IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing. 2007. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
  • 5Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007. http://download.boulder.ibm.com/ ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf
  • 6Zhang YX, Zhou YZ. 4VP+: A novel meta OS approach for streaming programs in ubiquitous computing. In: Proc. of IEEE the 21st Int'l Conf. on Advanced Information Networking and Applications (AINA 2007). Los Alamitos: IEEE Computer Society, 2007. 394-403.
  • 7Zhang YX, Zhou YZ. Transparent Computing: A new paradigm for pervasive computing. In: Ma JH, Jin H, Yang LT, Tsai JJP, eds. Proc. of the 3rd Int'l Conf. on Ubiquitous Intelligence and Computing (UIC 2006). Berlin, Heidelberg: Springer-Verlag, 2006. 1-11.
  • 8Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28.
  • 9Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 1998,30(1-7): 107-117.
  • 10Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003.29-43.

共引文献2165

同被引文献180

引证文献21

二级引证文献114

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部