期刊文献+

基于孤立点和初始质心选择的k-均值改进算法 被引量:7

An Improved k-means Algorithm Based on Outliers and Original Clustering Center
下载PDF
导出
摘要 介绍了在聚类中广泛应用的经典k-均值算法,针对其随机选择初始质心和易受孤立点的影响的不足,给出了一种改进的k-均值算法。首先使用距离法移除孤立点,然后采用邻近吸收法对初始质心的选择上进行了改进,并做了改进前后的对比试验。试验结果表明,改进后的算法比较稳定、准确,受孤立点和随机选择质心的影响也有所降低。 The classic algorithm of k-means was discussed,that was one of the most widespread methods in clustering,including both strongpoints and shortages.Not only is it sensitive to the original clustering center,but also it may be affected by the outliers.Given these shortages,an improved algorithm is discussed,which makes improvements in outliers and selection of original clustering center.The outlier detection is based on the distance method.To select original clustering center is assimilated based on the nearest neighbour.Experiment is checked,which indicates the improved one is more stable,more accurate.
出处 《长江大学学报(自科版)(上旬)》 CAS 2009年第1期60-62,共3页 JOURNAL OF YANGTZE UNIVERSITY (NATURAL SCIENCE EDITION) SCI & ENG
基金 黑龙江省教育厅科学技术研究项目(11521008) 黑龙江省自然科学基金资助项目(F200603)
关键词 K-均值算法 孤立点 初始质心 距离 algorithm of k-means outliers original clustering center distance
  • 相关文献

参考文献7

  • 1连凤娜,吴锦林,唐琦.一种改进的K-means聚类算法[J].电脑与信息技术,2008,16(1):38-40. 被引量:23
  • 2Marques J P, Written, Wu Y F. Trans Pattern Recognition Concepts, Methods and Applications [M] ,2nd ed. Beijing: Tsinghua University Press, 2002. 51-74.
  • 3Huang Z.A fast clustering algorithm to cluster very large categorical data sets in data mining [EB/OL] . http: // www. ece. northwestern, edu/-harsha/Clustering/sigmodfn, ps, 2008-12-15.
  • 4Sambasivam S, Theodosopoulos N. Advanced data clustering methods of mining Web documents [J] . Issues in Informing Science and Information Technology, 2006, (3) : 563-579.
  • 5Sanjay Chawla, Pei Sun. SLOM: a new measure for local spatial outliers[J] .Knowledge and Information Systems, 2006, (4) :412 -429.
  • 6尹珧人,王德广.一种改进的k-means聚类算法在入侵检测中的应用[J].科学技术与工程,2008,8(16):4701-4705. 被引量:7
  • 7Sudipto G, Rajeev R, Kyuseok S. Cure: an effieient Elustering algorithm forlarge databases [J] . Information Systems, 2001, 261:35-58.

二级参考文献7

  • 1陆声链,林士敏.基于距离的孤立点检测研究[J].计算机工程与应用,2004,40(33):73-75. 被引量:44
  • 2袁方,孟增辉,于戈.对k-means聚类算法的改进[J].计算机工程与应用,2004,40(36):177-178. 被引量:47
  • 3[1]Agrawalr S.Database mining:a performance perspective.IEEE Transctions on Knowledge and Data Engineering,1993:5(6):914-925
  • 4Han J W Kamber M 范明 孟小峰译.数据挖掘概念与技术[M].北京:机械工业出版杜,2001.147-158.
  • 5Kaufan L, Rousseeuw Pj. Finding Groups in Data: an Introduction to Cluster Analysis[M]. New York: John Wiley & Sons, 1990.
  • 6Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databased[C]. In Haas LM, Tiwary A eds. Proceedings of the ACM SIGMOD International Conference on Management of Data, Sesttle: ACM Press, 1998:73-84.
  • 7张玉芳,毛嘉莉,熊忠阳.一种改进的K-means算法[J].计算机应用,2003,23(8):31-33. 被引量:72

共引文献26

同被引文献55

引证文献7

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部