摘要
聚类是数据挖掘领域的一个重要研究方向,针对复杂数据集中存在的簇间密度不均匀、聚类形态多样、聚类中心的识别等问题,引入样本点k近邻信息计算样本点的相对密度,借鉴快速搜索和发现密度峰值聚类(CFSFDP)算法的簇中心点识别方法,提出一种基于相对密度和决策图的聚类算法,实现对任意分布形态数据集聚类中心快速、准确地识别和有效聚类.在7类典型测试数据集上的实验结果表明,所提出的聚类算法具有较好的适用性,与经典的DBSCAN算法和CFSFDP等算法相比,在没有显著提高时间复杂度的基础上,聚类效果更好,对不同类型数据集的适应性也更广.
Clustering is an important research domain in data mining. For some knotty problems in clustering complex datasets, such as uneven densities among clusters, miscellaneous patterns of clusters and the identification of the centers,a clustering method is proposed based on relative density and decision graph, which introduces the idea of k-nearest neighbors to compute the relative densities of data points, and uses the clustering by fast search and find of density peaks(CFSFDP) algorithm for identifying central points, which can identify central points quickly and accurately and cluster datasets of arbitrary distribution effectively. The experimental results on seven typical test datasets show that the proposed clustering algorithm has good feasibility and performance. Compared with the classical density-based spatial clustering of application with noise(DBSCAN) algorithm and CFSFDP algorithm, the proposed algorithm has better clustering effect and accuracy, and has a wider range of adaptation.
作者
周世波
徐维祥
ZHOU Shi-bo;XU Wei-xiang(School of Traffic and Transportation,Beijing Jiaotong University,B eijing 100044,China;Navigation College,Jimei University,Xiamen 361021,China)
出处
《控制与决策》
EI
CSCD
北大核心
2018年第11期1921-1930,共10页
Control and Decision
基金
国家自然科学基金项目(61672002
61272029
41501490)
福建省自然科学基金项目(2016J01243)
关键词
聚类
相对密度
决策图
密度峰值
K-近邻
数据挖掘
clustering
relative density
decision graph
density peaks
k-nearest neighbors
data mining