摘要
提出了一种基于距离的聚类和孤立点检测算法(DBCOD),根据距离阈值对数据点进行聚类,在聚类过程中记录每个数据点的密度,并根据密度阈值确定数据点是否为孤立点.实验结果表明,该算法不仅能够对数据集进行正确的聚类,可以发现任意形状的聚类,算法执行效率优于DBSCAN,具有对噪音数据、数据输入顺序不敏感等优点,同时还能有效地进行孤立点检测.
A distance-based clustering and outlier detection algorithm(DBCOD)is proposed in this paper, it records the datum points by distance threshold, counts the density of every datum point in clustering, identifies outliers by density threshold, determinates valid cluster and outlier cluster by the number of datum points in it. As shown in the experimental results,the DBCOD algorithm can cluster the dataset properly,it can discover clusters of arbitrary shapes,its efficiency is higher than that of DBSCAN,it is independent of data input order, it is not sensitive to noise and outlier data; and it can find clusters and outliers accurately and validly.
出处
《河南科学》
2007年第6期975-978,共4页
Henan Science
基金
河南省自然科学基金项目(0111051200)
关键词
聚类算法
孤立点检测
距离
密度
clusteringalgorithms
outlier detection
distance
density