摘要
k-Prototypes算法对初始点选取的敏感性导致聚类结果具有随机性,并且忽视样本数据点与聚类集合中已有样本的总体差异.针对此问题,文中提出基于维度频率相异度和强连通融合的混合数据聚类算法,首先通过多次预聚类产生大量子簇,然后根据子簇之间的连通关系,采用强连通融合的策略得到最终的聚类结果.在UCI数据库中3个混合属性数据集上的实验表明,相比k-Prototypes算法及已有的混合属性聚类算法,文中算法具有更好的聚类质量,从而验证文中算法的优越性.
The clustering result of k-Prototypes algorithm is unpredictable due to the sensitivity of the initial prototypes selection. Moreover, the whole diversity between data points and clusters is ignored. Therefore, a clustering algorithm based on dimensional frequency dissimilarity and strongly connected fusion is proposed. Plenty of sub-clusters are produced by multiple pre-clustering. According to the connectivity of those sub-clusters, strongly connected fusion is used to generate the final clusters. The proposed clustering algorithm is validated on three different UCI datasets. Meanwhile, it is compared with three mixed data clustering algorithms. The experimental results show that the proposed algorithm can yield better clustering precision and purity.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2016年第1期82-89,共8页
Pattern Recognition and Artificial Intelligence
基金
水利部公益性行业科研专项项目(No.201401044)资助~~
关键词
维度频率相异度
混合属性
聚类
强连通融合
Dimensional Frequency Dissimilarity, Mixed Attribute, Clustering, Strongly Connected Fusion