摘要
提出了一种数值型和类别型混合属性数据聚类的全局算法。算法通过随机选取足够多的初始原型来覆盖数据集的全局分布信息,然后通过评估函数迭代地消去多余的原型。最后对本文算法进行了验证,证明了该算法的有效性和收敛性。并与其他已有同类型算法的聚类结果进行比较,说明本文算法对混合属性数据具有更高的聚类准确度,为解决混合型数据聚类问题提供了一种新途径。
A new Global k-Prototype (GKP) algorithm is proposed for clustering mixed numeric and categorical data. First, the algorithm randomly selects a sufficiently large number of initial prototypes to account for the global distribution of the data sets. Then, it progressively eliminates the redundant prototypes using an iterative optimization process with an elimination criterion function. Systematic experiments were carried out with data from widely used datasets in this area. Experimental results and comparative evaluation show the high performance and consistency of the proposed algorithm. Compared with other well-known mixed data clustering algorithms, the proposed algorithm significantly improves the clustering accuracy.
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2013年第1期130-134,共5页
Journal of Jilin University:Engineering and Technology Edition
基金
国家自然科学基金项目(61175023
60973092
60903097)
符号计算与知识工程教育部重点实验室项目
国家留学基金委项目(2010617098)
关键词
人工智能
数据聚类
数据挖掘
K原型算法
混合属性数据
artificial intelligence
clustering
data mining
K-prototypes algorithm
mixed attribute data