期刊文献+

基于量子机制的改进的分类属性数据聚类算法 被引量:2

Improved clustering algorithm for categorical attribution data by using quantum mechanics
下载PDF
导出
摘要 分析量子势能、量子力学中粒子分布机制和针对分类属性数据的量子聚类CQC算法,发现该算法采用传统的Hamming相异性测度计算分类属性数据间的相异性测度,忽略分类属性取值自身的涵义和值间的特征关联,导致其聚类准确性较差.提出一种改进的MCQC算法,能根据数据对象的关联情况计算同属性不同值间的相异性,计算数据对象间的相异性测度,从而提高聚类准确率.仿真实验采用3个数据集,即:大豆疾病、国会投票真实数据集和从KDD-CUP99训练样本集抽取离散属性维构成的人造样本集.实验结果表明,该算法是有效且可行的,对分类属性、二值属性和混合属性数据的聚类准确率明显高于CQC算法. When the quantum potential, distribution mechanism of particle discussed in quantum mechanics, and CQC (categorical quantum clustering) algorithm were analyzed, it was found that the traditional Hamming dissimilarity measure was used for computing the measure of dissimilarity among the categorical attribution data and the implication of assignment of the categorical attribution proper and the characteristic correlation among the attributions were ignored, resulting in a worse accuracy of clustering. Therefore, an improved MCQC (modified categorical quantum clustering) algorithm was proposed, with which the dissimilarity among attribute values of identical attributes and dissimilarity measure among data objects could be calculated according to the correlation among the data objects, so that the clustering accuracy was improved. Three data sets were used for the experiment, they were soybean disease real data sets, con- gressional voting real data sets and synthetic data sets constituted from KDD-CUP99 training set by extraction of discrete attribution-dimension. Comprehensive experimental results demonstrated that the proposed algorithm was effective and feasible, and that the clustering accuracy was significantly improved for the pure categorical data, binary data and mixed data when compared with that of the CQC algorithm.
出处 《兰州理工大学学报》 CAS 北大核心 2009年第3期98-102,共5页 Journal of Lanzhou University of Technology
基金 甘肃省自然科学基金(3ZS051-A25-032) 甘肃省高校研究生导师基金(050301)
关键词 分类属性数据 量子聚类 聚类算法 相异性度量测度 categorical attribution data quantum clustering clustering algorithm dissimilarity measure
  • 相关文献

参考文献11

  • 1GUHA S,RASTOGI R,SH M K.CURE:an efficient clustering algorithm for large databases[C]//HAAS L M,TIVARY A.Proc of ACM SIGMOD International Conference on Management of Data.Seattle:ACM Press,1998:73-84.
  • 2HUANG Zhexue,MICHAEL K N.A fuzzy k-modes algorithm for clustering categorical data[J].IEEE Trans on Fuzzy Systems,1999,7(4):446-452.
  • 3HUANG Zhexue.A fast clustering algorithm to cluster very large categorical data sets in data mining[C]//Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery.New York:ACM Press,1997:1-8.
  • 4HUANG Zhexue.Extensions to the k-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
  • 5陈宁,陈安,周龙骧.数值型和分类型混合数据的模糊K-Prototypes聚类算法(英文)[J].软件学报,2001,12(8):1107-1119. 被引量:47
  • 6李志华,王士同.一种基于量子机制的分类属性数据模糊聚类算法[J].系统仿真学报,2008,20(8):2119-2122. 被引量:6
  • 7ESPOSITO F,MALEBRA D,TAMMA V,et al.Classical resemblance measures,analysis of symbolic data[M].New York:Springer,2000:139-152.
  • 8AHMAD A,DEY L.A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set[J].Pattern Recognition Letters,2007,28:110-118.
  • 9李志华,王士同.一种改进的量子聚类算法[J].数据采集与处理,2008,23(2):211-214. 被引量:5
  • 10GANTI V,GEKHRE J E,RAMAKRESHNAN R.CACTUS-clustering data using summaries[C]//Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Diego:ACM Press,1999:311-314.

二级参考文献17

  • 1吴文丽,刘玉树,赵基海.一种新的混合聚类算法[J].系统仿真学报,2007,19(1):16-18. 被引量:18
  • 2乐逸祥,周磊山,乐群星.微粒群算法的可视化仿真及算法改进[J].系统仿真学报,2007,19(6):1212-1216. 被引量:6
  • 3Huang Zhexue,IEEE Transactions Fuzzy Systems,1999年,7卷,4期,446页
  • 4Huang Zhexue,Data Mining and Knowledge Discovery,1998年,2卷,283页
  • 5Huang Zhexue,Proc the 1st Pacific Asia Conference on Knowledge Discovery and Data Mining,1997年,21页
  • 6Gasiorowicz S. Quantum physics[M]. New York: Wiley, 1996.
  • 7Horn D, Gottlieb A. The method of quantum clustering[J]. Proc of Advances in Neural Infor Proc Systems, 2001,14 : 769-776.
  • 8Horn D, Gottlieb A. Algorithm for data clustering in pattern recognition problems based on quantum mechanics [J]. Physical Review Letters, 2002, 88(1):018702.1-018702.4.
  • 9Horn D. Clustering via Hilbert space[J]. Physica A, 2001,302:70-79.
  • 10Horn D, Axel I. Novel clustering algorithm for microarray expression data in a truncated SVD space [J]. Bioinformatics, 2003,19(9). 1110-1115.

共引文献49

同被引文献19

  • 1周涓,熊忠阳,张玉芳,任芳.基于最大最小距离法的多中心聚类算法[J].计算机应用,2006,26(6):1425-1427. 被引量:72
  • 2殷晓明,顾幸生.一种基于改进型遗传算法的模糊聚类[J].华东理工大学学报(自然科学版),2006,32(7):849-851. 被引量:8
  • 3SANGUTHEVAR R. Efficient parallel hierarchical-clustering algorithms [J]. IEEE Transactions on Parallel and Distributed Systems, 2005,16 (6) : 497-502.
  • 4HUANG Zhexue,MICHAEL K N. A fuzzy k-modes algorithm for clustering categorical data [J]. IEEE Trans on Fuzzy Systems, 1999,7(4) :446-452.
  • 5HUANG Zhexue A fast clustering algorithm to cluster very large categorical data sets in data mining [C]//Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. New York: ACM Press, 1997 : 1-8.
  • 6HUANG Zhexue Extensions to the k-means algorithm for clustering large data sets with categorical values [J]. Data Mining and Knowledge Discovery, 1998,2(3) : 283-304.
  • 7AHMAD A, DEW L. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set [J].Pattern Recognition Letters, 2007,28(1): 110-118.
  • 8KIM D W, LEE K H,LEE D. On cluster validity index for estimation of the optimal number of fuzzy clusters [J]. Pattern Recognition, 2004,37(10) : 2009-2025.
  • 9KIM M, RAMAKRISHNA R S. New indices for cluster validity assessment [J]. Pattern Recognition Letters, 2005, 26 (15) : 2353-2363.
  • 10SUN Y,ZHU QM,CHEN Z X. An iterative initial-points refinement algorithm for categorical data clustering [J]. Pattern Recognition Letters, 2002,23 (7) : 875-884.

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部