期刊文献+

基于哈希技术和MapReduce的大数据集K-近邻算法 被引量:7

K-Nearest Neighbor Algorithm Based on Hash Technology and MapRecuce
下载PDF
导出
摘要 K-近邻是一种著名的分类算法。由于简单且易于实现,因此其被广泛应用于许多领域,如人脸识别、基因分类、决策支持等。然而,在大数据环境中,K-近邻算法变得非常低效,甚至不可行。针对这一问题,提出了一种基于哈希技术和MapReduce的大数据集K-近邻分类算法。为了验证算法的有效性,在4个大数据集上进行了实验,结果显示,在保持分类能力的前提下,所提算法可以大幅度地提高K-近邻算法的效率。 K-nearest neighbor(K-NN)is a famous classification algorithm.Because the idea of K-NN is simple and it is easy to implement,K-NN has been widely applied to many fields,such as face recognition,gene classification and decision making,etc.However,in the big data environment,the efficiency of K-NN is very low,even it is not workable.In order to deal with this problem,based on hash technology and MapRecuce,this paper proposed an improved K-nearest neighbor algorithm.In order to verify the effectiveness of the proposed algorithm,some experiments were conducted on4 big data sets.The experimental results show that the proposed algorithm is effective and efficient.
出处 《计算机科学》 CSCD 北大核心 2017年第7期210-214,共5页 Computer Science
基金 国家自然科学基金项目(71371063) 河北省自然科学基金项目(F2017201026) 河北省高等学校科学技术研究重点项目(ZD20131028) 河北大学研究生创新资助项目(X2016059)资助
关键词 K-近邻 哈希技术 分类算法 大数据集 K-nearest neighbor Hash technology Classification algorithms Big data sets
  • 相关文献

参考文献1

二级参考文献52

  • 1Mayer-Sch?nberger V, Cukier K. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Eamon Dolan/Houghton Mifflin Harcourt, 2013.
  • 2Hey T, Tansley S, Tolle K. The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond: Microsoft Research, 2009.
  • 3Bryant R E. Data-intensive scalable computing for scientific applications. Comput Sci Engin, 2011, 13: 25-33.
  • 4周志华. 机器学习与数据挖掘. 中国计算机学会通讯, 2007, 3: 35-44.
  • 5Zhou Z H, Chawla N V, Jin Y, et al. Big data opportunities and challenges: Discussions from data analytics perspectives. IEEE Comput Intell Mag, 2014, 9: 62-74.
  • 6Jordan M. Message from the president: The era of big data. ISBA Bull, 2011, 18: 1-3.
  • 7Kleiner A, Talwalkar A, Sarkar P, et al. The big data bootstrap. In: Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, 2012, 1759-1766.
  • 8Shalev-Shwartz S, Zhang T. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In: Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, 2014, 64-72.
  • 9Gonzalez J E, Low Y, Gu H, et al. PowerGraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Hollywood, 2012, 17-30.
  • 10Gao W, Jin R, Zhu S, et al. One-pass AUC optimization. In: Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, 2013, 906-914.

共引文献43

同被引文献46

引证文献7

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部