期刊文献+

一种基于距离的聚类和孤立点检测算法 被引量:2

A Clustering and Outlier Detection Algorithm Based on Distance
下载PDF
导出
摘要 提出了一种基于距离的聚类和孤立点检测算法(DBCOD),根据距离阈值对数据点进行聚类,在聚类过程中记录每个数据点的密度,并根据密度阈值确定数据点是否为孤立点.实验结果表明,该算法不仅能够对数据集进行正确的聚类,可以发现任意形状的聚类,算法执行效率优于DBSCAN,具有对噪音数据、数据输入顺序不敏感等优点,同时还能有效地进行孤立点检测. A distance-based clustering and outlier detection algorithm(DBCOD)is proposed in this paper, it records the datum points by distance threshold, counts the density of every datum point in clustering, identifies outliers by density threshold, determinates valid cluster and outlier cluster by the number of datum points in it. As shown in the experimental results,the DBCOD algorithm can cluster the dataset properly,it can discover clusters of arbitrary shapes,its efficiency is higher than that of DBSCAN,it is independent of data input order, it is not sensitive to noise and outlier data; and it can find clusters and outliers accurately and validly.
出处 《河南科学》 2007年第6期975-978,共4页 Henan Science
基金 河南省自然科学基金项目(0111051200)
关键词 聚类算法 孤立点检测 距离 密度 clusteringalgorithms outlier detection distance density
  • 相关文献

参考文献7

  • 1HANJia-wei KamberMicheline 范明.数据挖掘:概念与技术[M].北京:机械工业出版社,2001..
  • 2Likas A, Vlassis N, Verbeek J J. The global k-means algorithm[J]. Pattern Recognition, 2003,36:451-461.
  • 3Martin Ester, Hans-Peter Kriegel,Jorg Sander, et al. A density-based algorithm for discovering clusters in large spatial databases with noise: KDD'96: Proceedings of 2nd international conference on knowledge discovery and data mining [C]. Portland, Oregon: AAAI Press, 1996:226-231.
  • 4Knorr E M, Ng R T. Algorithms for mining distance-based outliers in large datasets.. Proceedings of the 24th VLDB conference [C]. New York, USA: Morgan Kaufmann, 1998:392-403.
  • 5Chiu A L, Fu A W. Enhancements on local outliers detection.. Proceedings of the seventh international database engineering and application symposium [C]. Hong Kong: [s.n.], 2003.
  • 6Breunig M M,Kriegel H-P, Ng R T, et al. LOF..Identifying density-based local outliers:Proceedings of ACM SIGMOD international Conference on management of data[C]. Dalles, Texas: ACM Press, 2000.
  • 7Hsu Chihming, Chen Mingsyan. Subspace clustering of high dimensional spatial data with noise:Advanced in knowledge discovery and data mining: 8th Pacific-Alia Conference [C]. Berlin: Springer, 2004:31-40.

共引文献21

同被引文献12

  • 1陆声链,林士敏.基于距离的孤立点检测研究[J].计算机工程与应用,2004,40(33):73-75. 被引量:44
  • 2孙焕良,鲍玉斌,于戈,赵法信,王大玲.一种基于划分的孤立点检测算法[J].软件学报,2006,17(5):1009-1016. 被引量:16
  • 3张长,邱保志.LDC-mine——基于局部偏差系数的孤立点挖掘算法[J].计算机应用,2007,27(1):95-97. 被引量:3
  • 4罗敏,阴晓光,张焕国,王丽娜.基于孤立点检测的入侵检测方法研究[J].计算机工程与应用,2007,43(13):146-149. 被引量:7
  • 5GUHA Sudipto. Cure: An efficient clustering algorithm for large databases[ C]// SIGMOD Conference, New York: ACM Press, 1998: 73-84.
  • 6AGRAWAL Rakesh. Fast discovery of association rules[ C]// Advances in Knowledge Discovery and Data Mining. Menlo Park, CA, USA: American Association for Artificial Intelligence, 1996: 307-328.
  • 7JOHANNA H, ROCKE D M. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator[J]. Computational Statistics & Data Analysis. 2004, 44 : 625-638.
  • 8邵峰津,孙仁成,于忠清.基于单元的孤立点发现改进算法[C]//中国科协2003年学术年会论文集:上,2003:538.
  • 9U S University of California, Irvine. E1 nino data [ DB ]. [2008-09-09] http://kdd.ics. uci. edu/databases/el_nino/el_nino. html. 30 June 1998
  • 10Huh W.K., Falvo J.V., Gerke L.C., et a 1.Globalanalysis of Protein Localization in Budding Yeast.Nature ,2003,425 (6959) : 686-69.

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部