摘要
由于传统K-medoids聚类算法对初始中心点敏感,计算迭代次数较高,存在聚类准确率不够高等缺点。为了解决中心点敏感问题,首先利用密度思想为数据集中每个对象建立一个ε0-领域,利用最大最小距离法遴选出K个密度大且距离较远的ε0-领域,把对应的ε0-领域的核心对象作为聚类算法的K个初始中心点;然后,为了解决传统K-medoids聚类算法的迭代次数较高、全局搜索的盲目性,在获取有效初始中心点的前提下,提出了以初始中心点为核心进行ε0-领域搜索更新策略,用来减少聚类算法的中心点更新迭代次数;同时,为了解决传统K-medoids聚类算法聚类准确率较低等缺点,提出了赋予簇内距离和簇间距离不同权重的准则函数,增强聚类算法的评价标准。改进后的算法在Iris和Wine数据集上进行测试,实验结果表明,初始中心点分别位于不同的簇中,降低了算法的迭代次数,提高了聚类准确率。
This paper established a ε0- area block for each object of database and selected K ε0- areas in which their densities are larger and the distances are far away from each selected ε0- areas blocks. Taking the core objects of the corresponding ε0- areas blocks as the K initial centers; we updated K centers by using ε0- area block search strategy to reduce the number of iterations. What's more,this paper presented a weighted criterion function based on between- clusters distance and within- clusters distance to improve clustering accuracy. The results of experiments show that this improved algorithm tested with standard data set Iris and Wine of UCI,can obtain ideal initial centers located in difference clusters,which finds a optimal solution in less iteration and improves the accuracy of clustering algorithm greatly.
出处
《计算机仿真》
CSCD
北大核心
2016年第10期244-248,277,共6页
Computer Simulation
基金
湖南省研究生科研创新项目(CX2014B386)
关键词
聚类算法
局部密度区域
初始中心点
领域搜索策略
加权准则函数
Clustering algorithm
Local density region
Initial center
Domain search strategy
Weighted criterion function