摘要
传统K-medoids聚类算法的聚类结果随初始中心点不同而波动,且计算复杂度较高不适于处理大规模数据集;快速K-medoids聚类算法通过选择合适的初始聚类中心改进了传统K-medoids聚类算法,但是快速K-medoids聚类算法的初始聚类中心有可能位于同一类簇。为克服传统K-medoids聚类算法和快速K-medoids聚类算法的缺陷,提出一种基于粒计算的K-medoids聚类算法。算法引入粒度概念,定义新的样本相似度函数,基于等价关系产生粒子,根据粒子包含样本多少定义粒子密度,选择密度较大的前K个粒子的中心样本点作为K-medoids聚类算法的初始聚类中心,实现K-medoids聚类。UCI机器学习数据库数据集以及随机生成的人工模拟数据集实验测试,证明了基于粒计算的K-medoids聚类算法能得到更好的初始聚类中心,聚类准确率和聚类误差平方和优于传统K-medoids和快速K-medoids聚类算法,具有更稳定的聚类结果,且适用于大规模数据集。
Traditional K-medoids clustering algorithm has some drawbacks,such as its clustering results being sensitive to initial cluster centers and its deficiency in large datasets.Although the fast K-medoids algorithm overcame the shortcomings of traditional K-medoids,it has the potential disadvantages of selecting the exemplars in the same cluster as initial seeds for different clusters.To overcome the shortcomings of the traditional K-medoids and the fast K-medoids clustering algorithms,a granular computing based K-medoids clustering algorithm was proposed in this paper.The algorithm defined a new similarity function between samples via pooling granularity,where the granules were produced via the equivalence relationship.The density of a granule was defined according to the number of samples in it,after that the K samples closest to the centers of the first K granules were selected as the initial centers for K-medoids clustering algorithm to cluster datasets.The experimental results on the datasets from UCI machine learning repository and on the synthetic datasets all demonstrate that the new granular computing based K-medoids clustering algorithm can find much better initial centers.Its clustering accuracy and its clustering error are better than those of the traditional K-medoids and the fast K-medoids clustering algorithms.It can get much more stable results and can be applied to cluster large datasets.
出处
《计算机应用》
CSCD
北大核心
2012年第7期1973-1977,共5页
journal of Computer Applications
基金
陕西省自然科学基金资助项目(2010JM3004)
中央高校基本科研业务费专项(GK201102007)
陕西师范大学2011年研究生培养创新基金资助项目(2011CXS029)