期刊文献+

一种基于密度聚类的分布式离群点检测算法 被引量:10

A Distributed Outlier Detection Algorithm Based on Density Clustering
下载PDF
导出
摘要 局部离群点检测算法是数据挖掘中的一个重要研究方向,随着数据的爆炸式增长,挖掘离群点的工作变得更加有意义,当前的各种检测算法在处理大规模数据上存在很多不足。论文将传统的离群点检测算法LOF和Hadoop分布式平台下的MapReduce分布式框架结合,实现了并行化策略,并且通过密度聚类算法DBSCAN对其进行了改进。论文算法和LOF算法、其他改进算法相比在效率和准确率上均有所提高。并且随着Hadoop系统中数据节点个数的增加,算法的运行效率相应的有所提高,实验结果表明论文算法在处理大规模数据上是可行的。 Local outlier detection algorithm is an important research direction in data mining,with the explosive growth of data mining,outlier work becomes more meaningful. The current detection algorithms have many disadvantages in dealing with large-scale data. This paper combines the traditional outlier detection algorithm LOF and the MapReduce distributed framework of Hadoop distributed platform,and implements the parallelization strategy,and improves it by density clustering algorithm DBSCAN.Compared with other LOF algorithms and other improved algorithms,the proposed algorithm improves both efficiency and accuracy.Moreover,with the increase of the number of data nodes in the Hadoop system,the efficiency of the algorithm is improved accordingly. The experimental results show that the algorithm is feasible in dealing with large-scale data.
作者 刘亚梅 闫仁武 LIU Yamei;YAN Renwu(School of Computer Science,Jiangsu University of Science and Technology,Zhenjing 212003)
出处 《计算机与数字工程》 2019年第6期1320-1325,共6页 Computer & Digital Engineering
关键词 局部离群点检测 密度聚类 Hadoop MAPREDUCE 并行化 局部离群因子 local outlier detection density clustering Hadoop MapReduce parallelization local outlier factor
  • 相关文献

参考文献4

二级参考文献56

  • 1薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 2D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980
  • 3V Barnett, T Lewis. Outliers in Statistical Data. New York: John Wiley, 1994
  • 4E Knorr, R Ng. Algorithms for mining distance-based outliers in large data sets. The 24th Int'l Conf on Very Large Data Bases. New York, 1998
  • 5S Ramaswamy, R Rastogi, K Shim. Efficient algorithms for mining outliers from large data sets. The ACM SIGMOD 2000 Int'l Conf on Management of Data, Dalles, TX, 2000
  • 6R Agrawal, P Ragaran. A linear method for deviation detection in large databases. In: Proc of the 2nd Int'l Conf on Knowledge Discovery and Data Mining. Portland, OR: AAAI Press, 1996. 164~169
  • 7M Breunig, Hans-Peter Kriegel, R Ng et al. LOF: Identifying density-based local outliers. The ACM SIGMOD 2000 Int'l Conf on Management of Data, Dalles, TX, 2000
  • 8M Ester, Hans-Peter Kriegel, J Sander et al. Incremental clustering for mining in a data warehousing environment. The 24th Int'l Conf on Very Large Data Bases, New York, 1998
  • 9S Berchthold, D Keim, Hans-Peter Kriegel. The X-tree: An index structure for high-dimensional data. The 22nd Conf on Very Large Data Bases, Bombay, India, 1996
  • 10He Z,Xu X,Deng S.Discovering Cluster-based Local Outliers[J].Pattern Recognition Letters,2003,24(9-10):1642-1650.

共引文献160

同被引文献115

引证文献10

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部