期刊文献+

基于平均互信息的混合条件属性聚类算法

Clustering with Mixed Condition Attributes Based on Average Mutual Information
下载PDF
导出
摘要 混合条件属性参数间的距离值存在较大的差异,导致仅聚合距离数量级较大、较规律的数值条件属性对象,而忽视数量级较小、混沌,但类别特征更加明显的分类条件属性对象。提出了一种基于平均互信息的聚类算法。通过熵量化参数类别特性的大小,再根据熵的平均互信息计算方法衡量数据对象间类别的相同、相异特征量,统一数值和分类条件属性参数间距离的数量级,最后通过优化迭代自适应过程得到最终聚类结果。实验结果表明,该算法具有良好的聚类质量和自适应性。 There is a great difference between the distances of mixed condition attributes parameter.The numeric condition attributes object with larger and law magnitude tends to be clustered only.With small and chaos magnitude,the categorical condition attributes object which has obvious category characteristics will be ignored.A clustering algorithm based on average mutual information was proposed.First,the size of parameter category characteristics is quantified through entropy.Then,the similarity and the difference between category characteristics are measured according average mutual information of entropy.The magnitude between distances of numeric and categorical condition attributes parameter is unified.At last,the final clustering result is got by optimizing iterative adaptive process.The experimental results show that the proposed algorithm was high clustering quality and good adaptability.
作者 刘晋胜
出处 《计算机科学》 CSCD 北大核心 2015年第3期261-265,共5页 Computer Science
基金 广东省教育部产学研结合项目(2011A090200088) 广东省茂名市科技计划项目(2012B009) 广东省石化装备故障诊断重点实验室资助
关键词 混合条件属性 平均互信息 聚类 Mixed condition attributes Average mutual information Clustering
  • 相关文献

参考文献19

  • 1Tang Pang-ning, Michael Steinbaeh, Vipin Kumar. IntroductiontO data mining [M]. Beijing:Post:Telecom Press, 2006.
  • 2Jain A K. Data clustering:50 years beyond k-means[J]. Pattern Recognition Letters, 2010,31 (8) : 651-666.
  • 3Aggarwal C C, Han J,Wang J, et al. A framework for clustering evolving data streams[C]//Proc of VLDB. 2003:81-92.
  • 4Aggarwal C C, Han J, Wang J, et al. A framework for projected clustering of high dimensional data streams [C]//Proc. of VLDB. 2004 : 852-863.
  • 5Cao F, Estery M, Qian W, et al. Density-based clustering over- ran evolving data stream with noise[C]//Proc of the SIAM Conference on Data Mining (SDM). 2006:326-337.
  • 6Huang Z. Extension to K-means algorithm for clustering large datasets with categorical values[J]. Data Mining and Know- ledge Discovery II, 1998(2) : 283-304.
  • 7Aggarwal C C, Yu P S. A framework for clustering massive text and categorical data streams[C]//Proc of 6th Siam IntConf on Data Mining. Bethesda, 2006 : 477-481.
  • 8Guha S, Rastogi R, Shim K. ROCK:a robust clustering algo- rithm for categorical attributes[C]//Proc of ICDE. 1999: 512- 521.
  • 9Barbara D, Couto J, Yi L. COOLCAT: an entropy-based algo- rithm for categorical clustering[C]//Proc of CIKM. 2002 : 582- 589.
  • 10Ralambondrainy H. A conceptual version of the k-means algo-rithm[J]. Pattern Recognition Letters, 1995 : 1147-1157.

二级参考文献48

  • 1刘青宝,邓苏,张维明.基于相对密度的聚类算法[J].计算机科学,2007,34(2):192-195. 被引量:13
  • 2杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,30(8):1364-1371. 被引量:22
  • 3Babcock B,Babu S,Datar M,et al.Models and issues in data stream systems[C].Proceedings of PODS,2002,1-16.
  • 4Hulten G,Spencer L,Domingos P.Mining time-changing data streams[C].Proceedings of ACM SIGKDD,2001,97-106.
  • 5Han J,Kamber M.Data mining:concepts and techniques[M].Morgan Kaufmann Publishers,2001.
  • 6Guha S,Rastogi R,Shim K.CURE:an efficient clustering algorithm for large databases[C].In Proc.of SIGMOD,1998,73-84.
  • 7Zhang T,Ramakrishnan R,Livny M.BIRCH; an efficient data clustering method for very large[C].In Proc.of SIGMOD,1996,103-116.
  • 8Guha S,Rastogi R,Shim K.ROCK:a robust clustering algorithm for categorical attributes[C].In Proc.of ICDE,1999,512-521.
  • 9Barbara D,Julia Couto,Yi Li.COOLCAT:an entropy-based algorithm for categorical clustering[C].In Proc.of CIKM,2002,582-589.
  • 10Ralambondrainy H.A conceptual version of the k-means algorithm[J].Pattern Recognition Letter,1995,1147-1157.

共引文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部