期刊文献+

Binary-Positive下的并行化CURE算法 被引量:3

Parallel CURE algorithm with Binary-Positive
下载PDF
导出
摘要 当CURE算法在处理不均匀的海量数据时,针对随机抽样不具有代表性的问题,提出了一种健壮的并行化改进算法。该算法使用Binary-Positive算法得到原始数据的有效属性,并利用MapReduce并行框架对有效数据进行层次聚类,从而实现了正确率与效率的一种权衡。实验分析表明,改进后的CURE算法具有更高的执行效率,且聚类效果良好。 For random sampling is not representative, it proposes a robust parallel improvement of algorithms when using CURE algorithm to handle non-uniform mass data. It uses the Binary-Positive algorithm to get the effective properties of the data, uses valid data for hierarchical clustering with MapReduce, which is a distributed parallel framework. It achieves the correct rate and efficiency of a trade-off. The tests show that the improved CURE algorithm has a higher efficiency in the implementation and has a good clustering result.
出处 《计算机工程与应用》 CSCD 2014年第11期58-61,共4页 Computer Engineering and Applications
基金 国家自然科学基金(No.61073196) 陕西省教育厅专项科研计划项目基金(No.11JK0982)
关键词 聚类 利用代表点聚类(CURE) Binary—Positive MAPREDUCE 并行 clustering Clustering Using Representative(CURE) Binary-Positive MapReduce parallel
  • 相关文献

参考文献14

  • 1Guha s, Rastogi R, Shim K.CURE: an efficient cluster- ing algorithm for large databases[C]//Proceedings of the ACM SIGMOD International Conference on Manage- ment of Data, Seattle, Washington, 1998: 73-84.
  • 2Qian Yuntao,Shi Qingsong,Wang Qi.CURE-NS a hierar- chical clustering algorithm with new shrinking scheme[C]// Proceedings of the 1st International Conference on Ma- chine Leaming and CybernetiCs,Beijing,2002:895.899,.
  • 3时念云,张金明,褚希.基于CURE算法的相似重复记录检测[J].计算机工程,2009,35(5):56-58. 被引量:11
  • 4赵妍,赵学民.基于CURE的用户聚类算法研究[J].计算机工程与应用,2012,48(11):97-101. 被引量:8
  • 5沈洁,赵雷,杨季文,李榕.一种基于划分的层次聚类算法[J].计算机工程与应用,2007,43(31):175-177. 被引量:13
  • 6Gelbard R, Goldman O, Spiegler I.Investigating diversity of clustering methods: an empirical comparison[J].Data & Knowledge Engineering, 2007,63 ( 1 ) : 155-166.
  • 7Gelbard R, Spiegler I.Hempel's raven paradox: a posi- tive approach to cluster analysis[J].Computers and Oper- ations Research, 2000,27(4). 305-320.
  • 8WHITET.Hadoop权威指南[M].北京:清华大学出版社.2010.5.
  • 9虚拟化与云计算小组.IBM-虚拟化与云计算[M].北京:电子工业出版社,2010:173-175.
  • 10Blake C, Keogh E, Merz C J.UCI repository of machine learning databases[Z].Irvine,CA:Department of Informa- tion and Computer Science,University of California, 1998.

二级参考文献81

共引文献1167

同被引文献41

  • 1魏桂英,郑玄轩.层次聚类方法的CURE算法研究[J].科技和产业,2005,5(11):22-24. 被引量:12
  • 2邱保志,沈钧毅.基于网格技术的高精度聚类算法[J].计算机工程,2006,32(3):12-13. 被引量:11
  • 3贺玲,吴玲达,蔡益朝.数据挖掘中的聚类算法综述[J].计算机应用研究,2007,24(1):10-13. 被引量:222
  • 4董健康.数据挖掘中CURE聚类算法研究[J].电脑与电信,2007(4):14-15. 被引量:3
  • 5WHITET.Hadoop权威指南[M].北京:清华大学出版社.2010.5.
  • 6Han Jiawei, Kamber M. Data mining: concepts and techniques [ M ]. San Francisco: Morgan Kaufmann Publish- ers, 2000.
  • 7Guha S, Rastogi R, Shim K. CURE:An Efficient Clustering Algorithm for Large Database[ C]. In:Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Seattle, Washington, 1998.73 - 84.
  • 8Karypis G,Han E H,Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling [J ]. Computer, 1999, (32) :68 - 75.
  • 9Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster. A Multi-Resolution Clustering Approach for Very Large Spatial Databases[ C]. In: Proceedings of the 24th VLDB Conference. New York, USA: [ s. n. ], 1998:428 - 439.
  • 10adrian.聚类的测试数据集[DB/OL].http://www.pudn.com/downloads78/sourcecode/math/detail297600.html,2014-05-10.

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部