摘要
为了提高局部异常检测算法的检测效率以及检测的准确度,提出基于Hadoop的分布式局部异常检测算法MR-DINFLO。该算法在INFLuenced Outlierness(INFLO)算法的基础上,引入了MapReduce计算框架,将数据点的k近邻、k距离、反向k近邻、局部离群因子的计算并行化处理,从而提高了检测效率。算法在计算各个数据对象之间的距离时采用加权距离,通过引入信息熵来判断离群属性,给离群属性以较大的权重,从而提高了异常检测的准确度。实验在3节点Hadoop集群上进行,输入数据为KDD-CUP99。当输入数据集大小为500万条时,所提出的MR-DINFLO算法检测准确度为0.94,检测时间为2589s。实验结果表明该算法具有高效可行性。
In order to improve the efficiency and the accuracy of local outlier detection algorithm,this paper proposes a distributed local outlier detection algorithm MR- DINFLO based on Hadoop.Under the guidence of INFLuenced Outlierness (INFLO) algorithm,the algorithm parallelizes the calculation of the k-nearest neighbors,k-distance,reverse-k-nearest neighbors,and the value of local outlier factor by introducing MapReduce framework.Thus,the algorithm improves the efficiency of outlier detection.The weighted distance is introduced to calculate the distance of two data object.The algorithm determines the outlier attribute of the data by introducing information entropy, and those outlier attributes are assigned with bigger weight.So the algorithm improves the accuracy of outlier detection.The experiment was performed on a 3-node Hadoop cluster with input data of KDD-CUP 99.When the input data size is 5 million,the MR-DINFLO algorithm proposed in this paper has a detection accuracy of 0.94 and a detection time of2589 seconds.Experimental results indicate that MR- DINFLO is efficient and effective.
作者
李永政
郝新兵
Li Yongzheng;Hao Xinbing(National Computer System Engineering Research Institute ofChina,Beijing100083,China;Information Security Research Institute ofChina,Beijing100000,China)
出处
《信息技术与网络安全》
2019年第6期52-56,60,共6页
Information Technology and Network Security
基金
高分青年基金(CFZX04061502)