期刊文献+

基于K-近邻树的离群检测算法 被引量:4

Outlier detection based on K-nearest neighborhood MST
下载PDF
导出
摘要 为适应数据集分布形状多样性以及克服数据集密度问题,针对已有算法对离群簇检测效果欠佳的现状,提出了一种基于K-近邻树的离群检测算法KNMOD(outlier detection based on K-nearest neighborhood MST)。算法结合密度与方向因素,提出一种基于K-近邻的不相似性度量,然后带约束切割基于此度量构建的最小生成树从而获得离群点。算法可以有效地检测出局部离群点以及局部离群簇,与LOF、COF、KNN及INFLO算法的对比结果也证实了算法的优越性能。 To adapt to the various distribution shape of data set and overcome the density problem of data set, addressing the issue of unsatisfactory result of existing algorithms on detecting outlying cluster, this paper presented an outlier detection algo- rithm based on K-nearest neighborhood MST. This algorithm focused on data sets of any arbitrary shape and density and could effectively detect local outliers and local outlying cluster. Taking the density and directional factor into consideration, it pro- posed a new dissimilarity measure based on K-nearest neighborhood. Then it built a minimum spanning tree on this K-nearest neighborhood dissimilarity measure, finally progressively constrained the tree to cut to find out the outliers. Compared with LOF, COF, KNN and INFLO algorithm, the results reflect the effectiveness and excellence of this new algorithm.
出处 《计算机应用研究》 CSCD 北大核心 2015年第3期669-673,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61272194 61073058)
关键词 离群检测 离群簇 最小生成树 不相似性 K-近邻 outlier detection outlying cluster minimum spanning tree dissimilarity K-nearest neighborhood
  • 相关文献

参考文献13

  • 1ASSENT I, KRANEN P, BALDAUF C, et al. Anyout:anytime outli- er detection on streaming data [ C ]//Database Systems for Advanced Applications. Berlin : Springer,2012:228-242.
  • 2薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 3徐翔,刘建伟,罗雄麟.离群点挖掘研究[J].计算机应用研究,2009,26(1):34-40. 被引量:27
  • 4黄洪宇,林甲祥,陈崇成,樊明辉.离群数据挖掘综述[J].计算机应用研究,2006,23(8):8-13. 被引量:42
  • 5SU Xiao-gang, TSAI C L. Outlier detection[ J]. Wiley Interdiscipli- nary Reviews: Data Mining and Knowledge Discovery,2011,1 (3) : 261-268.
  • 6RAMASWAMY S, RASTOGI R, SHIM K. Efficient algorithms for mining outliers from large data sets[ C ]//Proc of ACM SIGMOD Con- ference on Management of Data. 2000:427-438.
  • 7GHOTING A, PARTHASARATHY S, OTEY M E. Fast mining of distance-based outliers in high-dimensional datasets[ J]. Data Mining and Knowledge Discovery, 2008,16 ( 3 ) : 349- 364.
  • 8ZHANG Ke, HUTTER M, JIN Hui-dong. A new local distance-based outlier detection approach for scattered real-world data [ C ]//Ad- vances in Knowledge Discovery and Data Mining. Berlin: Springer, 2009 : 813- 822.
  • 9KIM S, CHO N W, KANG B, et al. Fast outlier detection for very large log data [ J ]. Expert Systems with Applications, 2011,38 (8) :9587-9596.
  • 10TANG Jian, CHEN Zhi-xiang, FU A W C, et al. Enhancing effec- tiveness of outlier detections for low density patterns [ C ]//Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2002: 535-548.

二级参考文献87

共引文献148

同被引文献21

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部