期刊文献+

MapReduce下融合PAM算法与仔细播种的多样本归并聚类

Multi-samples Merging Clustering Algorithms Combining PAM Algorithm and Careful Seeding Based on MapReduce
下载PDF
导出
摘要 传统PAM(Partitioning Around Medoids)算法时间复杂度较高,处理大数据集时效率低下.近年来,越来越多研究者使用MapReduce模型来使聚类算法获得更高的性能,然而MapReduce模型在算法迭代过程中需要多次重启任务、从文件系统读取数据和数据洗牌,影响数据处理效率.本文提出两种基于MapReduce的融合PAM算法与仔细播种的聚类处理模型,在保持PAM算法聚类有效性的同时,在算法性能上获得显著提高.性能试验和聚类有效性实验的结果表明本文提出的方法达到了预期的效果且具有很好的可扩展性. Common PAM (Partitioning Around Modoids ) algorithm works inefficiently for large-scale data set due to its time complexity. Recently,more and more researchers apply MapReduce model to obtain high performance for clustering algorithms. However,MapReduce model needs repeated times of restarting jobs ,reading data from file system and data shuffling which will have impacts on data processing efficiency. In this paper, we propose two clustering processing models based on MapReduce model, PAM algorithm and careful seeding to obtain high performance and maintain cluster validity of PAM in the same time. The performance evaluation and clustering validation experiments demonstrate that the methods we have proposed are efficient, robust and scalable.
作者 赵宝文 徐华
出处 《小型微型计算机系统》 CSCD 北大核心 2017年第10期2281-2285,共5页 Journal of Chinese Computer Systems
基金 江苏省自然科学基金项目(BK20140165)资助 国家留学基金委赞助项目(201308320030)资助
关键词 PAM聚类算法 MAPREDUCE 概率抽样 性能 聚类有效性 partitioning around medoids MapReduce probability sampling performance cluster validity
  • 相关文献

参考文献1

二级参考文献6

共引文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部