期刊文献+

一种改进的K-means聚类算法 被引量:23

An Improved Algorithm of K-means
下载PDF
导出
摘要 K-means算法是最常用的聚类算法之一,有很多的优点,但也存在着不足。它不仅对样本的输入顺序敏感,可能产生局部最优解,而且受孤立点的影响很大。文章正是针对这些不足,提出了一种改进的K-means算法,主要从数据预处理、初始聚类中心的选择方面进行了改进,并做了改进前后算法的对比实验。结果表明,改进后的算法不但更具稳定性,准确度也高,受孤立点的影响也大大降低。 K-means algorithm is one of the most widespread methods in clustering, including both strong points and also shortages. Not only is it sensitive to the order of sample data, but also it may make out the local excellent and be affected by the outliers. Given these shortages, an improved algorithm is discussed, which makes improvements in data preprocessing and selection of original clustering center. Check experiment was done, which indicates the improved one is more stable, more accurate and the affection by the outliers is down to a much low figure.
出处 《电脑与信息技术》 2008年第1期38-40,共3页 Computer and Information Technology
关键词 K-MEANS算法 聚类 孤立点 k-means dustering outliers
  • 相关文献

参考文献5

  • 1Han J W Kamber M 范明 孟小峰译.数据挖掘概念与技术[M].北京:机械工业出版杜,2001.147-158.
  • 2Kaufan L, Rousseeuw Pj. Finding Groups in Data: an Introduction to Cluster Analysis[M]. New York: John Wiley & Sons, 1990.
  • 3Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databased[C]. In Haas LM, Tiwary A eds. Proceedings of the ACM SIGMOD International Conference on Management of Data, Sesttle: ACM Press, 1998:73-84.
  • 4陆声链,林士敏.基于距离的孤立点检测研究[J].计算机工程与应用,2004,40(33):73-75. 被引量:44
  • 5袁方,孟增辉,于戈.对k-means聚类算法的改进[J].计算机工程与应用,2004,40(36):177-178. 被引量:48

二级参考文献13

  • 1JiaweiHan MichelineKamber 范明 孟小峰 译.数据挖掘概念与技术[M].北京:机械工业出版社,2002..
  • 2E M Knorr,R T Ng,V Tucakov. Distance-Based Outliers :Algorithms and Applications[J].VLDB Journal:Very Large Databases,2000:237~253
  • 3S D Bay,M Schwabacher. Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule[C].In:SIGKDD '03, Washington, DC, USA ,2003
  • 4J Laurikkala,M Juhola,E Kentala. Informal Identification of Outliers in Medical Data[C].In :5th International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, (IDAMAP-2000) ,2000
  • 5K Yamanishi,J Takeuchi.A Unifying Framework for Detecting Oulliers and Change Points from Non-Stationary Time Series Data[C].In:SIGKDD '02 Edmonton,Alberta,Canda,2002
  • 6S Ramaswamy,R Rastogi,K Shim. Efficient Algorithms for Mining Outliers from Large Data Sets[C].In:Proceedings of the ACM SIGMOD Conference, 2000: 473~438
  • 7Wen Jin,K H Tung,Jiawei Han. Mining Top-n Local Outliers in Large Databases[C].In:KDD 2001 San Francisco,California USA
  • 8F Angiulli,C Pizzuti.Fast Outlier Detection in High Dimensional Spaces[C].In:Proccedings of the Sixth European Conference on the Principles of Data Mining and Knowledge Discovery,2002:15~16
  • 9NHL data.http://moo. Hawaii.edu: 1749/hockey/hockey.html
  • 10La Jolla. Alternatives to the k-means algorithm that find better clustering. Department of Computer Science and Engineering,University of California,San Diego,CA92093

共引文献201

同被引文献151

引证文献23

二级引证文献275

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部