期刊文献+

依赖数据密度的K均值初始化调优 被引量:4

Improved k-means initialization method based on data density
下载PDF
导出
摘要 K均值算法虽被广泛应用,但其算法性能和算法稳定性严重依赖算法的初始化过程,尤其是初始聚类中心的选取。比较合理的聚类中心应该出现在数据密集的区域,基于这个假设,提出了一种依赖数据局部密度的初始化调优算法。该算法以数据的局部密度函数为依据,并在高密度区域选取初始聚类中心。与同类算法相比,该算法有如下特点:能够自主发现数据集中数据分布的局部密集度;对类别数目较多的数据表现出更好的性能;对离群点和噪声鲁棒;易于实现。 K-means is a widely used clustering method in many communities. However, the initial procedure affects the clustering results seriously, especially the initial centroids. Reasonable initial centroids should be in the region with high data density, so an improved k-means initialization method is proposed based on local data density. Firstly, a definition of local data density function is given, and then initial centroids are chosen based on this definition. Experimental result shows that the proposed method has several advantages:it can find the data densities effective and the reasonable candi-dates of initial centroids, it shows outstanding performance when the number of categories is related large, it is robust to outliers and noisy, it is easy to implement.
作者 沈国珍
出处 《计算机工程与应用》 CSCD 2014年第11期139-144,166,共7页 Computer Engineering and Applications
关键词 聚类 K均值算法 聚类中心 密度函数 clustering k-means initialization data density
  • 相关文献

参考文献8

二级参考文献67

共引文献728

同被引文献34

引证文献4

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部