摘要
局部离群点检测算法是数据挖掘中的一个重要研究方向,随着数据的爆炸式增长,挖掘离群点的工作变得更加有意义,当前的各种检测算法在处理大规模数据上存在很多不足。论文将传统的离群点检测算法LOF和Hadoop分布式平台下的MapReduce分布式框架结合,实现了并行化策略,并且通过密度聚类算法DBSCAN对其进行了改进。论文算法和LOF算法、其他改进算法相比在效率和准确率上均有所提高。并且随着Hadoop系统中数据节点个数的增加,算法的运行效率相应的有所提高,实验结果表明论文算法在处理大规模数据上是可行的。
Local outlier detection algorithm is an important research direction in data mining,with the explosive growth of data mining,outlier work becomes more meaningful. The current detection algorithms have many disadvantages in dealing with large-scale data. This paper combines the traditional outlier detection algorithm LOF and the MapReduce distributed framework of Hadoop distributed platform,and implements the parallelization strategy,and improves it by density clustering algorithm DBSCAN.Compared with other LOF algorithms and other improved algorithms,the proposed algorithm improves both efficiency and accuracy.Moreover,with the increase of the number of data nodes in the Hadoop system,the efficiency of the algorithm is improved accordingly. The experimental results show that the algorithm is feasible in dealing with large-scale data.
作者
刘亚梅
闫仁武
LIU Yamei;YAN Renwu(School of Computer Science,Jiangsu University of Science and Technology,Zhenjing 212003)
出处
《计算机与数字工程》
2019年第6期1320-1325,共6页
Computer & Digital Engineering