期刊文献+

基于MapReduce的并行k-modes算法

Parallel k-modes Algorithm based on MapReduce
下载PDF
导出
摘要 k-modes是一种代表性的分类数据的聚类算法。首先对k-modes聚类算法的实现过程进行了改进:通过在分配数据对象到簇时更新这个簇中各个属性项的次数,使得在遍历一次全部数据对象就能计算出新的簇中心。为了使k-modes能够处理大规模分类数据,在Hadoop平台上用MapReduce并行计算模型实现了k-modes算法。实验表明:在处理大量数据时,并行k-modes比串行k-modes极大地缩短了聚类时间,取得了较好的加速比。 K-modes is a representative categorical attribute oriented clustering algorithm.First,improve the implement process of k-modes: when allocating categorical objects to clusters,update the number of items of each attribute in clusters.so that can compute the new modes of clusters after read the whole dataset once.In order to make k-modes capable for large-scale categorical data,implement k-modes on Hadoop using MapReduce parallel computing model.Experiments show that,parallel k-modes achieve good speedup when dealing with large-scale categorical data.
作者 郭涛 丁祥武
出处 《智能计算机与应用》 2015年第1期43-45,共3页 Intelligent Computer and Applications
基金 国家自然科学基金(61103046) 上海市自然科学基金(11ZR1401200)
关键词 分类数据 k-modes 并行聚类 MAPREDUCE Categorical Data k-modes Parallel Clustering MapReduce
  • 相关文献

参考文献10

  • 1Han J, Kamber M, Pei J. Data mining: concepts and techniques [M]. Third Edition. Burlington: Morgan kaufmann, 2011.
  • 2Agresti A. Categorical data analysis [ M ]. Hoboken: John Wiley & Sons, 2014.
  • 3HUANG Z. A fast clustering algorithm to cluster very large categorical data sets in data mining[ CJ//Tucson: DMKD, 1997:281 -297.
  • 4MACQUEEN J. Some methods for classification and analysis of multi- variate observations [ C ]//Proceedings of the fifth Berkeley Symposi- am on Mathematical Statistics and Probability, Oakland: University of California Press, 1967 : 281 - 297.
  • 55HVACHKO K, KUANG H, RADIA S, et al. The hadoop distribu- :ed file system [ C ]//Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, Incline Villiage: IEEE, 2010 : 1 - 10.
  • 6DEAN J, GHEMAWAT S. MapReduce: simplified data processing m large clusters[J]. Communications of the ACM, 2008, 51 ( 1 ) : t07 - 113.
  • 7HEMAWAT S, GOBIOFF H, LEUNG S T. The Google file system[C]//ACM SIGOPS Operating Systems Review, New York: ACM, !003 : 29 - 43.
  • 8CHANG F, DEAN J, GHEMAWAT S, et al. Bigtable: A distributed storage system for structured data[ J ]. ACM Transactions on Comput- er Systems (TOCS), 2008, 26(2) : 4.
  • 9Pacheco P S. Parallel programming with MPI[ M]. Burlington: Mor- gan kaufmann, 1997.
  • 10DAGUM L, MENON R. OpenMP: an industry standard API for shared- memory programming[J]. Computational Science & Engi- neering, IEEE, 1998, 5(1): 46-55.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部