摘要
K-近邻是一种著名的分类算法。由于简单且易于实现,因此其被广泛应用于许多领域,如人脸识别、基因分类、决策支持等。然而,在大数据环境中,K-近邻算法变得非常低效,甚至不可行。针对这一问题,提出了一种基于哈希技术和MapReduce的大数据集K-近邻分类算法。为了验证算法的有效性,在4个大数据集上进行了实验,结果显示,在保持分类能力的前提下,所提算法可以大幅度地提高K-近邻算法的效率。
K-nearest neighbor(K-NN)is a famous classification algorithm.Because the idea of K-NN is simple and it is easy to implement,K-NN has been widely applied to many fields,such as face recognition,gene classification and decision making,etc.However,in the big data environment,the efficiency of K-NN is very low,even it is not workable.In order to deal with this problem,based on hash technology and MapRecuce,this paper proposed an improved K-nearest neighbor algorithm.In order to verify the effectiveness of the proposed algorithm,some experiments were conducted on4 big data sets.The experimental results show that the proposed algorithm is effective and efficient.
出处
《计算机科学》
CSCD
北大核心
2017年第7期210-214,共5页
Computer Science
基金
国家自然科学基金项目(71371063)
河北省自然科学基金项目(F2017201026)
河北省高等学校科学技术研究重点项目(ZD20131028)
河北大学研究生创新资助项目(X2016059)资助
关键词
K-近邻
哈希技术
分类算法
大数据集
K-nearest neighbor
Hash technology
Classification algorithms
Big data sets