摘要
异常检测一直是数据挖掘领域的重要工作之一。基于欧氏距离的异常检测算法在应用于高维数据时存在检测精度无法保证和运行时间过长的问题。在基于角度方差的异常检测算法基础上,提出了一种多层次的高维数据异常检测算法(hybrid outlier detection algorithm based on angle variance for high-dimensional data,HODA)。算法结合了粗糙集理论,分析属性之间的相互作用以排除影响较小的属性;通过分析各维度上的数据分布,对数据进行网格划分,寻找可能存在异常点的网格;最后对可能存在异常点的网格计算角度方差异常因子,筛选异常数据。实验结果表明,与ABOD、Fast VOA和经典LOF算法相比,HODA算法在保证精测精度的前提下,运行时间显著缩短,且可扩展性强。
Outlier detection is a major task of data mining. Outlier detection methods based on Euclidean distances are not ca- pable for high-dimensional data because they can hardly ensure the cost of the computation and the accuracy. After analyzing angle-based outlier detection method, this paper proposed a novel approach called hybrid outlier detection algorithm based .on angle variance for high-dimensional data. The algorithm first utilized rough set theory to analyze the impact between the attri- butes and abandoned less important ones. Then it divided data into different cubes according to the distribution of data on every attribute. It only focused on the cubes with high possibility to contain outliers. At last, through the calculation of angle- based outlier factor, it was able to detect outliers. Compared to conventional algorithms, such as ABOD, FastVOA and LOF, the experimental results verify the feasibility of the proposed approach in terms of both efficiency and accuracy.
出处
《计算机应用研究》
CSCD
北大核心
2016年第11期3383-3386,共4页
Application Research of Computers
基金
中国民航大学中国民航信息技术科研基地资质项目(CCAC-ITRB-201301)
关键词
高维数据
异常检测
降维
网格
角度方差
high-dimensional data
outlier detection
dimensional reduction
grid
angle variance