基于平均互信息的混合条件属性聚类算法

Clustering with Mixed Condition Attributes Based on Average Mutual Information

下载PDF

导出

摘要混合条件属性参数间的距离值存在较大的差异,导致仅聚合距离数量级较大、较规律的数值条件属性对象,而忽视数量级较小、混沌,但类别特征更加明显的分类条件属性对象。提出了一种基于平均互信息的聚类算法。通过熵量化参数类别特性的大小,再根据熵的平均互信息计算方法衡量数据对象间类别的相同、相异特征量,统一数值和分类条件属性参数间距离的数量级,最后通过优化迭代自适应过程得到最终聚类结果。实验结果表明,该算法具有良好的聚类质量和自适应性。 There is a great difference between the distances of mixed condition attributes parameter.The numeric condition attributes object with larger and law magnitude tends to be clustered only.With small and chaos magnitude,the categorical condition attributes object which has obvious category characteristics will be ignored.A clustering algorithm based on average mutual information was proposed.First,the size of parameter category characteristics is quantified through entropy.Then,the similarity and the difference between category characteristics are measured according average mutual information of entropy.The magnitude between distances of numeric and categorical condition attributes parameter is unified.At last,the final clustering result is got by optimizing iterative adaptive process.The experimental results show that the proposed algorithm was high clustering quality and good adaptability.

作者刘晋胜

机构地区广东石油化工学院计算机与电子信息学院

出处《计算机科学》 CSCD 北大核心 2015年第3期261-265,共5页 Computer Science

基金广东省教育部产学研结合项目(2011A090200088) 广东省茂名市科技计划项目(2012B009) 广东省石化装备故障诊断重点实验室资助

关键词混合条件属性平均互信息聚类 Mixed condition attributes Average mutual information Clustering

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献19

1Tang Pang-ning, Michael Steinbaeh, Vipin Kumar. IntroductiontO data mining [M]. Beijing:Post:Telecom Press, 2006.
2Jain A K. Data clustering:50 years beyond k-means[J]. Pattern Recognition Letters, 2010,31 (8) : 651-666.
3Aggarwal C C, Han J,Wang J, et al. A framework for clustering evolving data streams[C]//Proc of VLDB. 2003:81-92.
4Aggarwal C C, Han J, Wang J, et al. A framework for projected clustering of high dimensional data streams [C]//Proc. of VLDB. 2004 : 852-863.
5Cao F, Estery M, Qian W, et al. Density-based clustering over- ran evolving data stream with noise[C]//Proc of the SIAM Conference on Data Mining (SDM). 2006：326-337.
6Huang Z. Extension to K-means algorithm for clustering large datasets with categorical values[J]. Data Mining and Know- ledge Discovery II, 1998(2) : 283-304.
7Aggarwal C C, Yu P S. A framework for clustering massive text and categorical data streams[C]//Proc of 6th Siam IntConf on Data Mining. Bethesda, 2006 : 477-481.
8Guha S, Rastogi R, Shim K. ROCK:a robust clustering algo- rithm for categorical attributes[C]//Proc of ICDE. 1999: 512- 521.
9Barbara D, Couto J, Yi L. COOLCAT: an entropy-based algo- rithm for categorical clustering[C]//Proc of CIKM. 2002 : 582- 589.
10Ralambondrainy H. A conceptual version of the k-means algo-rithm[J]. Pattern Recognition Letters, 1995 : 1147-1157.

二级参考文献48

1刘青宝,邓苏,张维明.基于相对密度的聚类算法[J].计算机科学,2007,34(2):192-195. 被引量：13
2杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,30(8):1364-1371. 被引量：22
3Babcock B,Babu S,Datar M,et al.Models and issues in data stream systems[C].Proceedings of PODS,2002,1-16.
4Hulten G,Spencer L,Domingos P.Mining time-changing data streams[C].Proceedings of ACM SIGKDD,2001,97-106.
5Han J,Kamber M.Data mining:concepts and techniques[M].Morgan Kaufmann Publishers,2001.
6Guha S,Rastogi R,Shim K.CURE:an efficient clustering algorithm for large databases[C].In Proc.of SIGMOD,1998,73-84.
7Zhang T,Ramakrishnan R,Livny M.BIRCH; an efficient data clustering method for very large[C].In Proc.of SIGMOD,1996,103-116.
8Guha S,Rastogi R,Shim K.ROCK:a robust clustering algorithm for categorical attributes[C].In Proc.of ICDE,1999,512-521.
9Barbara D,Julia Couto,Yi Li.COOLCAT:an entropy-based algorithm for categorical clustering[C].In Proc.of CIKM,2002,582-589.
10Ralambondrainy H.A conceptual version of the k-means algorithm[J].Pattern Recognition Letter,1995,1147-1157.

共引文献30

1万仁霞,王立新,刘振文.基于相异度矩阵的混合属性数据流聚类算法[J].计算机工程与应用,2008,44(25):149-151. 被引量：8
2张晓龙,曾伟.实时数据流聚类的研究新进展[J].计算机工程与设计,2009,30(9):2177-2181. 被引量：5
3李贤,罗可.BIRCH混合属性数据聚类方法[J].计算机工程与应用,2009,45(30):123-125. 被引量：3
4黄德才,吴天虹.基于密度的混合属性数据流聚类算法[J].控制与决策,2010,25(3):416-421. 被引量：11
5付淇,黎虹,李广振.流数据聚类研究综述[J].科技广场,2010(1):237-240.
6苏晓珂,兰洋,秦玉明,程耀东.基于衰减模型的混合属性数据流离群检测[J].计算机科学,2010,37(5):157-161. 被引量：1
7陈荣晖,王伦文.一种新的滑动窗口模型数据流聚类方法[J].小型微型计算机系统,2010,31(12):2355-2358. 被引量：7
8谭建建,郑洪源,丁秋林.基于信息熵降维的混合属性数据流聚类算法[J].计算机工程,2011,37(19):82-84.
9向剑平,乔少杰,胡剑.基于聚类分析的申贷信用等级评价方法[J].云南大学学报（自然科学版）,2011,33(6):639-644. 被引量：1
10章季阳,王伦文.一种领域覆盖的数据流聚类算法[J].小型微型计算机系统,2012,33(9):1913-1916. 被引量：4

1周靖,刘晋胜.一种面向混合属性对象的初始簇中心定位的新算法[J].计算机应用研究,2016,33(9):2634-2636.
2姜祖新,张德贤,张苗,李军军.基于新型文档频的平均互信息改进研究[J].软件导刊,2012,11(5):138-139.
3张小峰,张志旺,逄珊.基于通信系统的决策树构造算法[J].山东大学学报（工学版）,2011,41(4):79-84. 被引量：1
4彭春华,程乾生.一种基于最小张树的属性聚类算法[J].系统工程理论与实践,2001,21(2):30-34. 被引量：3
5周靖.平均互信息和类别区分性修剪规则的KNN算法[J].计算机应用,2013,33(2):558-562.
6赵玲.属性聚类算法在入侵检测中的应用[J].网络安全技术与应用,2004(12):49-51. 被引量：1
7唐继勇,江宝安.基于遗传算法的信道容量计算[J].科技通报,2012,28(4):144-146.
8李恒杰,郝晓弘,张磊.基于克隆选择算法的优化迭代学习控制[J].西安石油大学学报（自然科学版）,2008,23(2):83-88.
9刘杰,孙秦.基于最近邻接点的涡点搜索算法[J].计算机工程与设计,2013,34(3):920-924.
10王昕.含噪声图像的多聚焦融合算法[J].光学精密工程,2011,19(12):2977-2984. 被引量：18

计算机科学

2015年第3期

浏览历史

内容加载中请稍等...

基于平均互信息的混合条件属性聚类算法

参考文献19

二级参考文献48

共引文献30

相关作者

相关机构

相关主题

浏览历史