摘要
分析量子势能、量子力学中粒子分布机制和针对分类属性数据的量子聚类CQC算法,发现该算法采用传统的Hamming相异性测度计算分类属性数据间的相异性测度,忽略分类属性取值自身的涵义和值间的特征关联,导致其聚类准确性较差.提出一种改进的MCQC算法,能根据数据对象的关联情况计算同属性不同值间的相异性,计算数据对象间的相异性测度,从而提高聚类准确率.仿真实验采用3个数据集,即:大豆疾病、国会投票真实数据集和从KDD-CUP99训练样本集抽取离散属性维构成的人造样本集.实验结果表明,该算法是有效且可行的,对分类属性、二值属性和混合属性数据的聚类准确率明显高于CQC算法.
When the quantum potential, distribution mechanism of particle discussed in quantum mechanics, and CQC (categorical quantum clustering) algorithm were analyzed, it was found that the traditional Hamming dissimilarity measure was used for computing the measure of dissimilarity among the categorical attribution data and the implication of assignment of the categorical attribution proper and the characteristic correlation among the attributions were ignored, resulting in a worse accuracy of clustering. Therefore, an improved MCQC (modified categorical quantum clustering) algorithm was proposed, with which the dissimilarity among attribute values of identical attributes and dissimilarity measure among data objects could be calculated according to the correlation among the data objects, so that the clustering accuracy was improved. Three data sets were used for the experiment, they were soybean disease real data sets, con- gressional voting real data sets and synthetic data sets constituted from KDD-CUP99 training set by extraction of discrete attribution-dimension. Comprehensive experimental results demonstrated that the proposed algorithm was effective and feasible, and that the clustering accuracy was significantly improved for the pure categorical data, binary data and mixed data when compared with that of the CQC algorithm.
出处
《兰州理工大学学报》
CAS
北大核心
2009年第3期98-102,共5页
Journal of Lanzhou University of Technology
基金
甘肃省自然科学基金(3ZS051-A25-032)
甘肃省高校研究生导师基金(050301)
关键词
分类属性数据
量子聚类
聚类算法
相异性度量测度
categorical attribution data
quantum clustering
clustering algorithm
dissimilarity measure