摘要
针对基于密度的聚类方法不能发现密度分布不均的数据样本的缺陷,提出了一种基于代表点和点密度的聚类算法。算法通过检查数据库中每个点的k近邻来寻找聚类。首先选取一个种子点作为类的第一个代表点,其k近邻为其代表区域,如果代表区域中的点密度满足密度阈值,则将该点作为一个新的代表点,如此反复地寻找代表点,这些区域相连的代表点及其代表区域将构成一个聚类。实验结果表明,该算法能够发现任意形状、大小和密度的聚类。
Aimed to solve the problem that the density-based clustering algorithm dose not work well when data distribution is not even,a new clustering algorithm based on representatives and point density is provided.The algorithm discovers the clusters by examining k neighbors of each point in the data base.It chooses a seed point as the first representative and the representative's k neighbors as its represent area.If the point in the represent areas satisfies the density threshold,this point will be a new representative.And repeating searching like this,all the linked represent areas and representatives will be a cluster. Experimental results show that this algorithm can discover clusters with arbitrary shapes and densities at different levels.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第28期136-139,共4页
Computer Engineering and Applications
关键词
数据挖掘
聚类
点密度
代表点
密度阈值
data mining
clustering
point density
representative
density threshold