期刊文献+

基于MapReduce的模糊K-means算法并行化研究 被引量:1

Research on Parallelization of Fuzzy K-means Algorithm Based on MapReduce
下载PDF
导出
摘要 模糊K-means算法是一种能够定量地确定事物亲属关系的软聚类算法,由于该算法在大规模数据的分析和处理中存在的不足,因此提出一种基于MapReduce模型的并行化实现。首先在Map函数的输出传递给其他节点的Reduce函数之前,改进Combine函数设计,增加本地中间结果处理,减少通信开销,以提高MapReduce任务计算速度。然后在Hadoop分布式计算平台上对多组规模不同的数据集进行测试。实验表明,基于MapReduce的并行模糊K-means算法适合大规模数据的分析和处理,而且执行速度提高了约1.9倍,聚类效果更为显著。 The fuzzy K-means algorithm is a kind of important soft clustering algorithm which can quantitatively determine the relation of different objects.In view of the shortcomings of fuzzy K-means algorithm in large-scale data processing,therefore,this paper puts forward parallel implementation based on MapReduce programming model.First,in order to improve the computing speed of the MapReduce task,it can improve the design of the Combine function,add the local intermediate result processing and reduce the communication overhead before the output of the Map function is passed to the Reduce function of other nodes.Then,several sets of data sets with different sizes are tested on the Hadoop distributed computing platform.The experiments show that the parallel fuzzy K-means algorithm based on MapReduce is suitable for the analysis and processing of large-scale data,and the execution speed is increased by about 1.9 times,and the clustering effect is more remarkable.
作者 杨延庆 袁华兵 YANG Yanqing;YUAN Huabing(Division of Information Technology,Xi'an Medical University,Xi'an 710021)
出处 《计算机与数字工程》 2020年第7期1564-1567,1765,共5页 Computer & Digital Engineering
基金 陕西省青年科学基金项目(编号:71701160) 西安医学院教学改革研究项目(编号:2018JG-07)资助。
关键词 模糊K-means MAPREDUCE模型 Combine函数 HADOOP平台 fuzzy K-means Mapreduce model Combine function Hadoop platform
  • 相关文献

参考文献11

二级参考文献95

共引文献255

同被引文献7

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部