摘要
DBSCAN算法是一种基于密度的聚类算法,算法存在许多优点,也存在一些不足。比如对输入参数Eps敏感,DB-SCAN由于采用全局Eps值,所以在数据密度不均匀和类间距离相差比较大的情况下,聚类质量会受到很大影响。文中主要针对算法输入参数Eps以及数据密度不均匀问题加以改进,提出了一种新的数据分区方法,通过对k-dist图纵坐标距离值单维度聚类,然后对比横坐标实现分区,使每个分区的数据尽可能均匀。实验证明,改进算法明显缓解了全局Eps导致的聚类质量恶化问题,聚类结果更加准确。
The algorithm of DBSCAN is an algorithm based on density,including both many points and also shortages.For example the algorithm is sensitive to the input parameters,because the algorithm uses the global Eps,therefore in the case of uneven data and the larger distance between classes,the clustering quality will be greatly affected.Mainly improved the choice of Eps,and solved the problem of uneven data.Proposed a new method of data partition,by clustering the value of k-dist vertical axis,the algorithm completed partition.Each data partition was uniform.Experimental results show that improved algorithm eases the problem of deterioration clustering quality significantly.The improved algorithm has a more accurate result of clustering.
出处
《计算机技术与发展》
2011年第2期30-33,38,共5页
Computer Technology and Development
基金
安徽省教育科研重点项目(KJ2009A57)