期刊文献+

基于Hadoop的局部异常检测算法 被引量:1

Local outlier detection algorithm based on Hadoop
下载PDF
导出
摘要 为了提高局部异常检测算法的检测效率以及检测的准确度,提出基于Hadoop的分布式局部异常检测算法MR-DINFLO。该算法在INFLuenced Outlierness(INFLO)算法的基础上,引入了MapReduce计算框架,将数据点的k近邻、k距离、反向k近邻、局部离群因子的计算并行化处理,从而提高了检测效率。算法在计算各个数据对象之间的距离时采用加权距离,通过引入信息熵来判断离群属性,给离群属性以较大的权重,从而提高了异常检测的准确度。实验在3节点Hadoop集群上进行,输入数据为KDD-CUP99。当输入数据集大小为500万条时,所提出的MR-DINFLO算法检测准确度为0.94,检测时间为2589s。实验结果表明该算法具有高效可行性。 In order to improve the efficiency and the accuracy of local outlier detection algorithm,this paper proposes a distributed local outlier detection algorithm MR- DINFLO based on Hadoop.Under the guidence of INFLuenced Outlierness (INFLO) algorithm,the algorithm parallelizes the calculation of the k-nearest neighbors,k-distance,reverse-k-nearest neighbors,and the value of local outlier factor by introducing MapReduce framework.Thus,the algorithm improves the efficiency of outlier detection.The weighted distance is introduced to calculate the distance of two data object.The algorithm determines the outlier attribute of the data by introducing information entropy, and those outlier attributes are assigned with bigger weight.So the algorithm improves the accuracy of outlier detection.The experiment was performed on a 3-node Hadoop cluster with input data of KDD-CUP 99.When the input data size is 5 million,the MR-DINFLO algorithm proposed in this paper has a detection accuracy of 0.94 and a detection time of2589 seconds.Experimental results indicate that MR- DINFLO is efficient and effective.
作者 李永政 郝新兵 Li Yongzheng;Hao Xinbing(National Computer System Engineering Research Institute ofChina,Beijing100083,China;Information Security Research Institute ofChina,Beijing100000,China)
出处 《信息技术与网络安全》 2019年第6期52-56,60,共6页 Information Technology and Network Security
基金 高分青年基金(CFZX04061502)
关键词 异常检测 INFLuenced Outlierness HADOOP MAP REDUCE 并行化 信息熵 outlier detection INFLuenced Outlierness Hadoop MapReduce parallelize information entropy
  • 相关文献

参考文献4

二级参考文献20

  • 1孙焕良,鲍玉斌,于戈,赵法信,王大玲.一种基于划分的孤立点检测算法[J].软件学报,2006,17(5):1009-1016. 被引量:16
  • 2薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 3Breunig M M,Kriegel H P,Ng R T,et al.LOF:Identifying density-based local outliers[C]//Proc of ACM SIGMOD Conf.New York:ACM,2000:427-438.
  • 4Tang J,Chen Z,Fu A,et al.Enhancing effectiveness of outlier detections for low-density patterns[C]//Proc of Advances in Knowledge Discovery and Data Mining 6th Pacific Asia Conf.Berlin:Springer,2002:535-548.
  • 5Papadimitirou S,Kitagawa H,Gibbons P B,et al.LOCI:Fast outlier detection using the local correlation integral[C]//Proc of the 19th Int Conf on Data Engineering.Los Alamitos:IEEE Computer Society,2003:315-326.
  • 6Sanjay C,Pei Sun.SLOM:A new measure for local spatial outliers[J].Knowledge and Information Systems,2006,9(4):412-429.
  • 7Barnett V,Lewis T.Outliers in Statistical Data[M].New York:John Wiley and Sons,1994.
  • 8Johnson T,Kwok I,Ng R T.Fast computation of 2-dimensional depth contours[C]//Proc of the 4th Int Conf on Knowledge Discovery and Data Mining (KDD'98).New York:ACM,1998:224-228.
  • 9Knorr E M,Ng R T.Algorithms for mining distance-based outliers in large datasets[C]//Proc of the 24th Int Conf on Very Large Data Bases.New York:ACM,1998:392-403.
  • 10Ramaswamy S,Rastogi R,Shim K.Efficient algorithms for mining outliers from large data sets[C]//Proc of the 2000 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2000:93-104.

共引文献112

同被引文献16

引证文献1

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部