摘要
针对异常检测算法速度慢、精度低、稳定性差等问题,提出了一种通过异常概率排序提取异常点的算法(OAP).由于异常点相对正常点更容易通过对数据空间的均匀分割而孤立出来,所以OAP通过数据点在均匀N叉分割树中的孤立深度估算异常概率的大小,从而得到异常概率的排序,最终构造由k个异常概率最大的点组成的列表,列表中的数据就是所求的异常点.OAP不需要距离或密度的计算,复杂度被降到O(n)级.实验结果表明,对于规模线性增加的海量实验数据集,OAP消耗的CPU时间也线性增加;相对iForest算法,其速度提高了30倍,精度提高了20%~30%,且同一数据集上的多次实验结果一致,稳定性高.
An ordinal anomaly probability method (OAP) is proposed to improve the efficiency, effectiveness and stability of existing anomaly detection algorithms. Since anomalies are easier to be isolated by uniformly partitioning the data space, the order of anomaly probabilities can be evaluated in terms of isolation depths in uniform N-ary partition trees. Then, the k largest probable anomalies are extracted. OAP can ignore the evaluation of distance and density, and hence reduces the complexity to O(n). Experimental results show that the CPU time of OAP increases linearly with a linearly growing data set. Furthermore, comparisons show that OAP is 30 times faster than iForest and is much more stable, while its accuracy is improved by 20%-30%.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2011年第4期36-40,共5页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(60972146
60602025)
关键词
数据挖掘
异常检测
均匀分割
异常概率排序
data mining
anomaly detection
uniform partition
ordinal anomaly probability