期刊文献+

基于内聚度和耦合度的二分K均值方法 被引量:4

Bisecting K-means Clustering Method Based on Cohesion and Coupling
下载PDF
导出
摘要 聚类分析是数据挖掘中最重要的技术之一,它在社会经济的各个领域都具有重要作用,并被广泛应用。K均值算法是最经典、应用最广泛的聚类方法之一,但其缺点是过度依赖初始条件和聚类数目难以确定,这制约了其应用范围。引入簇的内聚度和耦合度的定义与度量方法,基于"高内聚低耦合"的原理,在二分K均值聚类过程中对得到的簇进行不断的分裂和合并,并判断聚类结果是否满足要求以确定聚类的次数和簇的个数,从而实现对二分K均值聚类过程的改进。在Iris数据集上的实验测试与分析表明该算法不仅更加稳定,而且其聚类结果的正确率也较高。 Clustering analysis is one of the most important techniques in data mining.It has important role and wide application in every field of social economy.K-means is one kind of the simple and widely used clustering methods,but its disadvantage is that it depends on the initial conditions and the number of clusters is difficult to determine.This paper introduced the cohesion and coupling of cluster,and presented the measurement of cohesion and coupling.Based on the principle of"high cohesion and low coupling",the clusters are constantly divided and merged in the process of bisecting K-Means clustering algorithm.By judging whether the clustering results meet the requirements,it can determine the number of clusters,thus improving the bisecting K-Means clustering algorithm.The experimental results on Iris data show that the algorithm is not only more stable,but also has higher clustering accuracy.
作者 郁湧 康庆怡 陈长赓 阚世林 骆永军 YU Yong1,2 ,KANG Qing -yi1, CHEN Chang -geng1,KAN Shi- lin1, LUO Yong- jun(2School of Software, Yunnan University ,Kunming G50504 ,China;2Key Laboratory for Software Engineering of Yunnan Province,Kunming 650504,Chin)
出处 《计算机科学》 CSCD 北大核心 2018年第B06期460-464,共5页 Computer Science
基金 国家自然科学基金项目(61462091) 云南大学数据驱动的软件工程省科技创新团队项目(2017HC012)资助
关键词 聚类 二分k均值 内聚度 耦合度 Clustering Bisecting K -means Cohesion Coupling
  • 相关文献

参考文献4

二级参考文献26

  • 1Savaresi S M, Boley D. On the Performance of Bisecting K-Means and PDDP[C]//Proc. of the 1st SIAM International Conference on Data Mining. Chicago, USA: [s. n.], 2001: 1-14.
  • 2Steinbach M, Karypis G, Kumar V. A Comparison of Document Clustering Techniques[C]//Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, USA: [s. n.], 2000: 525-526.
  • 3Liu Xiaozhang, Feng Guocan. Kernel Bisecting K-Means Clustering for SVM Training Sample Reduction[C]//Proc. of the 19th International Conference on Pattern Recognition. Tampa, USA: [s. n.], 2008: 1-4.
  • 4Han Jiawei,Kamber M.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2006
  • 5LI Tao, MA Sheng, OGIHARA M. Document clustering via adaptive subspace iteration[ C 1//Proc of the 27th Annum International ACM SIGIR Conference on Research and Development in Information Re-trieval. New York: ACM Press,2004:218-225.
  • 6WATANABLE K,AKAHO S, OMACHI S, et al. Simultaneous clustering and dimensionality reduction using variational Bayesian mixture model [C]//Proe of the llth IFCS Biennial Conferenee and the 33rd Annual Conference of the Gesellsehaft Par Klassifikafian e. V. 2010:81-89.
  • 7NIU Yan-min, WANG Xu-chu. Improving SVM via local geometric structure for high dimensional data classification [ C ] YTProc of Interna- tional Conference on Computer Science for Environmental Engineering and Ecoinformatics. [S. 1. ] : Springer,2011:299-304.
  • 8DUDA R O, HART P E, STORK D G. Pattern classificatio[ M] .2nd ed. New York : John Wiley,2000:35- 37.
  • 9YE Jie-ping, TAO Xiang. Null space versus orthogonal lihear discri- minant analysis [ C ]//Proc of the 23rd International Conference on Machine Learning. New York: ACM Press,2006:80-82.
  • 10FUNG G, MANGASARIAN O L. Proximal support vector machine classifiers[ C ]//Proc of the th ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining. New York: ACM Press,2001:77-85.

共引文献64

同被引文献31

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部