期刊文献+

海量信息异常检测问题的异常概率排序算法 被引量:4

Ordinal Anomaly Probability Algorithm for Anomaly Detection Problems of Massive Data Sets
下载PDF
导出
摘要 针对异常检测算法速度慢、精度低、稳定性差等问题,提出了一种通过异常概率排序提取异常点的算法(OAP).由于异常点相对正常点更容易通过对数据空间的均匀分割而孤立出来,所以OAP通过数据点在均匀N叉分割树中的孤立深度估算异常概率的大小,从而得到异常概率的排序,最终构造由k个异常概率最大的点组成的列表,列表中的数据就是所求的异常点.OAP不需要距离或密度的计算,复杂度被降到O(n)级.实验结果表明,对于规模线性增加的海量实验数据集,OAP消耗的CPU时间也线性增加;相对iForest算法,其速度提高了30倍,精度提高了20%~30%,且同一数据集上的多次实验结果一致,稳定性高. An ordinal anomaly probability method (OAP) is proposed to improve the efficiency, effectiveness and stability of existing anomaly detection algorithms. Since anomalies are easier to be isolated by uniformly partitioning the data space, the order of anomaly probabilities can be evaluated in terms of isolation depths in uniform N-ary partition trees. Then, the k largest probable anomalies are extracted. OAP can ignore the evaluation of distance and density, and hence reduces the complexity to O(n). Experimental results show that the CPU time of OAP increases linearly with a linearly growing data set. Furthermore, comparisons show that OAP is 30 times faster than iForest and is much more stable, while its accuracy is improved by 20%-30%.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2011年第4期36-40,共5页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(60972146 60602025)
关键词 数据挖掘 异常检测 均匀分割 异常概率排序 data mining anomaly detection uniform partition ordinal anomaly probability
  • 相关文献

参考文献10

  • 1ESKIN E,ARNOLD A,PRERAU M,et al.A geometric framework for unsupervised anomaly detection[J].Advances in Information Security,2002,56(4):21-31.
  • 2RICHARD B,DAVID H.Statistical fraud detection:a review[J].Statistical Science,2002,17 (3):235-255.
  • 3LI Xiaolie,LI Zhenhui,HAN Jiawei.Temporal outlier detection in vehicle traffic data[C] //Proc 2009 Int Conf on Data Engineering (ICDE'09).Piscataway,NJ,USA:IEEE,2009:1319-1322.
  • 4MARCEL P,ELIZABETH B,SEAN H,et al.A brain tumor segmentation framework based on outlier detection[J].Medical Image Analysis,2004,8(3):275-283.
  • 5HAN Jiawei,MICHELINE K.Data mining:concepts and techniques[M].Singapore:Elsevier,2006:5-9.
  • 6MARKUS M,HANS P,RAYMOND T,et al.LOF:identifying density-based local outliers[C] //Proc ACM SIGMOD'00.New York,USA:ACM,2000:93-104.
  • 7LIU Feitong,TING Kaiming.Isolation forest[C] //Proceedings of IEEE International Conference on Data Mining ICDM' 08.Piscataway,NJ,USA:IEEE,2008:413-422.
  • 8Y(U) Xiao,TANG Lu'an,HAN Jiawei.Filtering and refinement:a two-stage approach for efficient and effective anomaly detection[C] //Proc Int Conf on Data Mining ICDM' 09.Piscataway,NJ,USA:IEEE,2009:90-104.
  • 9HO Y C,SREENIVAS R.Ordinal optimization of DEDS[J].Discrete Event Dynamic Systems,1992,45(9):61-88.
  • 10KNUTH D,Art of computer programming[M].New York,USA:Addison-Wesley,1998:51-53.

同被引文献34

  • 1王文贺,刘莉.多元回归分析法在城市用电量预测中的应用[J].沈阳工程学院学报(自然科学版),2012,8(4):330-332. 被引量:6
  • 2Chandola V, Banerjee A, Kumar V. Anomaly detection: a sur- vey. ACM Computing Surveys, 2009; 41 (3) :1-58.
  • 3Jiang Shengyi, Yang Aimin. Framework of clustering-based outlier de- tection. 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery. 2009.
  • 4Chen Yumin, Miao Duoqian, Zhang Hongyun. Neighborhood outlier detection. Expert Systems with Applications, 2010; 37 ( 12 ) : 8745 -8749.
  • 5Tao Yunxin, Pi Dechang. Unifying density-based clustering and outli- er detection. Second International Workshop on Knowledge Discovery and Data Mining. 2009.
  • 6Hadi A S, Rahmatullah Imon A H M, Werner M. Detection of outli- ers. Wiley Interdisciplinary Reviews: Computational Statistics. 2009 ; 1 ( 1 ) :57-70.
  • 7Aehtert E, Kriegel H P, Reichert L Visual evaluation of outlier de- tection models. Springer-Verlag Berlin Heidelberg. 2010.
  • 8Kmietowicz Z W , Pearman A D . Decision Theory and Incomplete Knowledge [M].Hampshire, England : Gower Pub,1981.
  • 9Navickas, Valentinas, Sujeta, et al. Logistics Systems as A Factor of E ountry's Com-petitiveness[J]. Economics and Management,2011,(16).
  • 10Warren H H, Hau L. Lee, Uma Subramanian. The Impact of Logistics Performance on Trade[J]. Production and Operations Management, 2013,22(2).

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部