期刊文献+

基于密度偏倚抽样的局部距离异常检测方法 被引量:18

Anomaly Detection Algorithm Based on the Local Distance of Density-Based Sampling Data
下载PDF
导出
摘要 异常检测是数据挖掘的重要研究领域,当前基于距离或者最近邻概念的异常数据检测方法,在进行海量高维数据异常检测时,存在运算时间过长的问题.许多改进的异常检测方法虽然提高了算法运算效率,然而检测效果欠佳.基于此,提出一种基于密度偏倚抽样的局部距离异常检测算法,首先利用基于密度偏倚的概率抽样方法对所需检测的数据集合进行概率抽样,之后对抽样数据利用基于局部距离的局部异常检测方法,对抽样集合进行局部异常系数计算,得到的异常系数既是抽样数据的局部异常系数,又是数据集的近似全局异常系数.然后对得到的每个数据点的局部异常系数进行排序,异常系数值越大的数据点越可能是异常点.实验结果表明,与已有的算法相比,该算法具有更高的检测精确度和更少的运算时间,并且该算法对各种维度和数据规模的数据都具有很好的检测效果,可扩展性强. Anomaly detection is an important research area of data mining. Current outlier mining approaches based on the distance or the nearest neighbor can result in unmanageable long operation time when applied to massive high-dimensional data. Many improvements have been proposed to improve the algorithms, but the detection is ineffective. This paper presents a new anomaly detection algorithm based on the local distance of density-based sampling data. First, the density-based of probability sampling method is used to find a subset of the data in detection. Then, the method based on the local distance of local outlier detection is used to calculate the abnormal factor of each object in the subset. In using the density-based of sample data, the abnormal factor is obtained both as local outlier factor of the subset and as the approximate value of global outlier factor of the hole data. Having the abnormal factor of each object in the subset, data points with higher factor score indicate higher degree of outliers. Experimental results show that, compared with the existing algorithms, this algorithm has higher detection accuracy and less computation time. The algorithm has higher efficiency and stronger scalability for various dimensions and size of data points.
作者 付培国 胡晓惠 FU Pei-Guo HU Xiao-Hui(University of Chinese Academy of Sciences, Beijing 100049, China Science and Technology on Integrated Information System Laboratory (Institute of Software, The Chinese Academy of Sciences) Beijing 100190, China)
出处 《软件学报》 EI CSCD 北大核心 2017年第10期2625-2639,共15页 Journal of Software
基金 国家自然科学基金(U1435220) 国家高技术研究发展计划(863)(2012AA011206)~~
关键词 异常检测 局部异常系数 局部距离 密度偏倚抽样 SLDOF算法 anomaly detection outlier factor of local set local distance density-based sampling SLDOF algorithm
  • 相关文献

参考文献2

二级参考文献23

  • 1Han Jia-Wei,Kamber Micheline Data Mining:Concepts and Techniques (2nd Edition).San Francisco:Morgan Kaufmann Publishers,2006
  • 2Hawkins D.Identification of Outliers.London:Chapman and Hall,1980
  • 3Knorr E,Ng R.Algorithms for mining distance-based outliers in large datasets//Proceedings of the 24th VLDB Conference.New York,1998:392-403
  • 4Breunig M M,Kriegel H P,Ng R T et al.OPTICS-OF:Identifying local outliers//Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases.Prague,1999:262-270
  • 5Breunig M,Knegel H P,Ng R et al.LOF:Identifying density-based local outliers//Proceedings of ACM SIGMOD Conference.Dallas,Texas,2000:93-104
  • 6Tang J,Chen Z,Fu A et al.Enhancing effectiveness of outlier detections for low-density patterns//Proceeding of Advances in Knowledge Discovery and Data Mining 6th PacificAsia Conference.Taipei,China,2002:535-548
  • 7Papadimitirou S,Kitagawa H,Gibbons PB,Faloutsos C.LOCI:Fast outlier detection using the local correlation integral//Proceedings of the 19th International Conference on Data Engineering.Bangalore,2003.Los Alamitos:IEEE Computer Society,2003:315-326
  • 8Chawla Sanjay,Sun Pei.SLOM:A new measure for local spatial outliers.Knowledge and Information Systems,2006,9(4):412-429
  • 9Shekhar S,Chawla S.A Tour of Spaual Databases.Upper Saddle River,N.J.:Prentice Hall,2003
  • 10Lu Chang-Tien,Chen De-Chang,Kou Yu-Feng.Detecting spatial outliers with multiple attributes//Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03).Sacramento,2003:122-128

共引文献112

同被引文献108

引证文献18

二级引证文献138

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部