期刊文献+

Hadoop云平台下的聚类算法研究 被引量:6

Research on clustering algorithm on Hadoop platform
下载PDF
导出
摘要 为了解决在面对海量数据时机器学习算法很难在有效时间内完成规定的任务,并且很难有效地处理高维度、海量数据等问题,提出了基于Hadoop分布式平台的谱聚类算法并行化研究。利用MapReduce编程模式,将传统的谱聚类算法进行重新编写;在该平台上用Canopy算法对数据进行预处理,以达到更好的聚类效果。实验结果表明了设计的分布式聚类算法在加速比等方面有良好的性能,并且在数据伸缩率方面效果明显,改进后的算法适合处理海量数据。 To solve the issues of mass data that algorithms of machine learning can not complete the required tasks within the validity period and are difficult to processing high latitudes and mass data effectively, a parallelization research of spectral clustering algorithm based on Hadoop distributed platform is proposed. The traditional spectral clustering algorithm is re-writed by using the MapReduce programming model, while a Canopy algorithm is used for data preprocessing on the platform to achieve better clustering results. Experimental results verify that distributed clustering algorithm had good performance in speedup and some other aspects, had high ratio of elongation and the improved algorithm is much suitable for processing mass data.
出处 《计算机工程与设计》 CSCD 北大核心 2014年第5期1683-1687,共5页 Computer Engineering and Design
基金 国家自然科学基金项目(61163025) 内蒙古自然科学基金项目(2012MS0912) 内蒙古教育厅高校科研基金项目(njzy12110)
关键词 海量数据 机器学习 聚类算法 谱聚类 分布式框架 massive data machine learning clustering algorithm spectral clustering~ distributed ~ramework
  • 相关文献

参考文献9

  • 1Han J,Kamber M,Pei J.Data mining:Concepts and techniques[M].3rd ed.Burlington:Morgan Kaufmann,2011:5-6.
  • 2Shvachko K,Hairong Kuang.The hadoop distributed file system[C]//IEEE 26th Symposium.Incline Village,Mass Storage Systems and Technologies,2010:1-10.
  • 3Ulrike von Luxburg.A tutorial on spectral clustering[J].Statistics and Computing,2007,17 (4):395-416.
  • 4黄一岑,沈一帆.基于Normalized Cut的图像分割改进算法[J].计算机工程与应用,2008,44(34):179-181. 被引量:11
  • 5Chen Wenyen,Song Yangqiu,Bai Hongjie.Parallel spectral clustering in distributed systems[C]//IEEE Transactions on Pattern A nalysis and Machine Intelligence,2011,33 (3):568-586.
  • 6Ulrike yon Luxburg.Mikhail belkin and olivier bousquet[J].Annals of Statistics,Institute of Mathematical Statistice,2008,36 (2):555-586.
  • 7Cullum,Willoughby.Lanczos algorithms for large symmetric eigenvalue computations:Pro-grams Vol.Ⅱ[M].Birkhauser Boston Inc,2012:6-10.
  • 8McCallum A,Nigam K,Ungar L H.Efficient clustering of high-dimensional data sets with application to reference matc hing[C]//ProceedIngs of the sixth ACM SIGKDD Interna tional Conference on Knowledge Discovery and Data Mining.USA:ACM,2000:169-178.
  • 9赵卫中,马慧芳,傅燕翔,史忠植.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学,2011,38(10):166-168. 被引量:83

二级参考文献18

  • 1Shi J,Malik J.Normalized cuts and image segmentation[J].IEEE Trans on PAMI, 2000,22(8 ) : 888-890.
  • 2Tremeau A,Borel N.A region growing and merging algorithm to color segmentation[J].Pattern Recognition, 1997,30(7) : 1191-1203.
  • 3Mumford D,Shah J.Boundary detection by minimizing functionals[C]// IEEE Conf on Computer Vision and Pattern Recognition,San Francisco, 1985.
  • 4Borenstein E, Ellman S.Class -specific , top -down segmentation [C]// Proc ECCV, 1998:628-641.
  • 5Yu S X,Shi J.Multiclass spectral clustering[C]//International Conference on Computer Vision,2003.
  • 6Malik J,Belongie S,Shi J,et al.Textons,contours and regions:Cue integration in image segmentation[C]//Proc IEEE Intl Conf Computer Vision,1999,2:918-925.
  • 7Malik J,Belongie S,Leung T,et al.Contour and texture analysis for image segmentation[J].Intl Journal of Computer Vision,2000,5 ( 1 ) : 7-27.
  • 8Gamiol J C,Belongie S J,Majumdar S.Normalized cuts for spinal MRI segmentation[C]//Proc CARS 2002,Paris,France,2002.
  • 9Frangakis A S,Hegerl R.Segmentation of biomedical images with eigenvectors[C]//Proc IEEE International Symposium on Biomedical imaging, 2002: 90-93.
  • 10Shi J,Malik J.Motion segmentation and tracking using normalized cuts[C]//Intl Conf on Computer Vision, 1998.

共引文献92

同被引文献63

引证文献6

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部