期刊文献+

基于熵的K均值算法的改进 被引量:2

Improving K-means Algorithm Based on Entropy
下载PDF
导出
摘要 在高维数据中,K-means的相似度度量会遇到不同尺度、不同类型的数据等一些问题。本文提出了利用数据归一化预处理方法来改进K-means算法。在讨论一维数据初始中心点选取方法基础上,提出了基于熵的高维数据的初始中心点选取方法,通过对初始中心点选取方法的改进来减少K均值算法的迭代次数。实验结果表明,数据的归一化处理可以从根本上消除了数据类型的不一致对聚类的影响。 In high dimension data, calculating similarity of k-means meets some problems, such as different scale, different types and so on. This paper proposes data standardization and initial center selected method of one dimension data in k-means algorithm, this paper proposes initial center selected method which based on entropy. It will reduce iterative degree with initial center selected method.
出处 《广东技术师范学院学报》 2008年第9期27-29,40,共4页 Journal of Guangdong Polytechnic Normal University
关键词 均值 聚类 中心点 k-means clustering entropy center
  • 相关文献

参考文献4

二级参考文献22

  • 1余建桥,张帆.基于数据场改进的PAM聚类算法[J].计算机科学,2005,32(1):165-167. 被引量:15
  • 2S M Weiss,C A Kulikowski.Computer Systems That Learning:Classification and prediction Methods from statistics ,Neural Nets ,Machine Learning,and Expert Systems[M].San Mateo,CA:Morgan Kaufmann,1991.
  • 3S K Murthy.Automatic construction of decision trees from data:A multidisciplinary survey[J].Data Mining and Knowledge Discovery,1998; 2: 345-389.
  • 4J Gehrke,R Ramakrishnan,V Ganti.Rainforest:A framework for fast decision tree construction of large datasets[C].In:Pvoc 1998 Int Conf Very large Data Bases,New York,1998-08:416~427.
  • 5MacQueen J.Some Methods for Classification and Analysis of Multivariate Observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability,1967.
  • 6Wang Wei,Yang Jiong,Muntz R.STING:A Statistical Information Grid Approach to Spatial Data Mining[C]//Proc.of the 23rd International Conference on Very Large Data Bases,1997.
  • 7Agrawal R,Gehrke J,Gunopulcs D.Automatic Subspace Clustering of High Dimensional Data for Data Mining Application[C]//Proc.of ACM SIGMOD Intconfon Management on Data,Seattle,WA,1998:94-205.
  • 8Guha S,Rastogi R,Shim K.Cure:An Efficient Clustering Algorithm for Large Database[C]//Proc.of ACM-SIGMOND Int.Conf.Management on Data,Seattle,Washington,1998:73-84.
  • 9Han Jiawei,Kamber M.Data Mining:Concepts and Techniques[M].San Francisco:Morgan Kaufmann Publishers,2000.
  • 10Grabmeier J,Rudolph A.Techniques of Cluster Algorithms in Data Mining[J].Data Mining and Knowledge Discovery,2002,6(4):303.

共引文献209

同被引文献17

  • 1张海涛,刘超英,田水.权重确定的主客观综合法[J].江汉大学学报(自然科学版),2004,32(4):63-65. 被引量:25
  • 2范斐斐,李振波,陈佳品.基于K均值分段的语音识别在微机器人控制系统中的应用[J].电子技术应用,2006,32(5):4-6. 被引量:2
  • 3Steinley D.K-means clustering:A half-century synthesis[J].British Journal of Mathematical and Statistical Psychology,2006,5(9):1-34.
  • 4Tsai C Y,Chiu C C.Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm[J].Computational Statistics and Data Analysis,2008,52(10):4658-4672.
  • 5Morariu D,Vintan L,Ttesp V.Evolutionary feature selection for text documents using the SVM[C]//Proceedings of the 3rd International Conference on Neural Networks and Pattern Recognition.Barcelona,2006.
  • 6Rand W M.Objective criteria for the evaluation of clustering methods[J].Journal of the American Statistical Association,1971,66(336):846-850.
  • 7Hubert L,Arable P.Comparing partitions[J].Journal of Classification,1985,2(1):193-218.
  • 8Blaek C L,Mezr C J.UCI Repository of Machine Learning Databases[EB/OL].http://www.ics.uci.edu/~mlearn/MLRepository.html,2009-09-11.
  • 9林永民,朱卫东.基尼指数在文本特征选择中的应用研究[J].计算机应用,2007,27(10):2584-2586. 被引量:5
  • 10Macqueen J.Some Methods for Classification and Analy sis of Multi Variate Observations[C]//Proc.of,Berkeley Sympo sium on Mathematical Statistics and Probability,1967.

引证文献2

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部