期刊文献+

一种改进的多视图聚类集成算法 被引量:8

Improved Multi-view Clustering Ensemble Algorithm
下载PDF
导出
摘要 近年来,针对大数据的数据挖掘技术和机器学习算法研究变得日趋重要。在聚类领域,随着多视图数据的大量出现,多视图聚类已经成为了一类重要的聚类方法。然而,大多数现有的多视图聚类算法受算法参数设置、数据样本等影响,具有聚类结果不稳定、参数需要反复调节等缺点。基于多视图K-means算法和聚类集成技术,提出了一种改进的多视图聚类集成算法,其提高了聚类的准确性、鲁棒性和稳定性。其次,由于单机环境下的多视图聚类算法难以对海量的数据进行处理,结合分布式处理技术,实现了一种分布式的多视图并行聚类算法。实验证明,并行算法在处理大数据时的时间效率有很大提升,适合于大数据环境下的多视图聚类分析。 In recent years, data mining and machine learning algorithms for big data become increasingly important. In the clustering,with the appearance of multi-view data, multi-view clustering has become an important clustering method. However,many existing multi-vlew clustering algorithms are easily affected by parameter setting and dataset itself, so the clustering results are usually unstable. To overcome this problem, we presented a new multi-view clustering ensem- ble algorithm based on the multi-view K-means clustering algorithm in this paper. This algorithm uses ensemble tech- nique to improve the multi-view K-means algorithm performance, increasing the accuracy, robustness, and stability of clustering results. It is well known that one single computer cannot process too much data, because one computer has the limited computation resources. To improve the efficiency of multi-view clustering, we implemented a distributed multi-view clustering ensemble algorithm based on distributed processing technology. Experimental results show that the proposed approach has higher efficiency when processing large dataset, and it is suitable for multi-view clustering in big data environment.
作者 邓强 杨燕 王浩 DENG Qiang YANG Yan WANG Hao(School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031 ,China)
出处 《计算机科学》 CSCD 北大核心 2017年第1期65-70,共6页 Computer Science
基金 国家自然科学基金(61170111 61572407 61134002) 国家科技支撑计划课题(2015BAH19F02) 四川省科技支撑计划项目(2014SZ0207)资助
关键词 多视图聚类 聚类集成 分布式计算 并行化 Multi-view clustering, Clustering ensemble, Distributed Computation, Parallelization
  • 相关文献

参考文献3

二级参考文献26

  • 1廖松博,何震瀛.HDCH:MapReduce平台上的音频数据聚类系统[J].计算机研究与发展,2011,48(S3):472-475. 被引量:3
  • 2倪巍伟,陆介平,孙志挥.基于向量内积不等式的分布式k均值聚类算法[J].计算机研究与发展,2005,42(9):1493-1497. 被引量:15
  • 3Han J W, Kamber M. Data mining: concepts and techniques [M]. San Francisco, US: Morgan Kaufmann, 2001.
  • 4Buyya R, Yeo C S, Venugopal S. Market-oriented cloud computing: vision,hype, and reality for delivering IT services as computing utilities, Keynote Paper [C] // Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications. Dalian, China, 2009 :25-27.
  • 5Armbrust M, Fox A. Above the clouds: a Berkeley view of cloud computing[R]. USA: University of California at Berkeley, 2009.
  • 6Erdogmus H. Cloud computing., does nirvana hide behind the nebula[J]. IEEE Software, 2009,26 (2) : 4-6.
  • 7Ghemawat S,Gobioff H, Leung S. The google file system[J].S ACM SIGOPS Operating Systems Review, 2003,37 (5) : 29-43.
  • 8Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters [C] /// Proceedings of Operating Systems Design and Implementation. San Franciseo, CA, 2004 : 137-150.
  • 9Xu X W, Jager J, Kriegel H P. A fast parallel clustering algorithm for large spatial databases[J]. Data Mining and Knowledge Discovery,1999,3(3) :263-290.
  • 10郑纬民.云计算的大幕已经拉开.中国计算机学会通讯,2009,5(6):6-7.

共引文献140

同被引文献61

引证文献8

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部