期刊文献+

基于角度方差的多层次高维数据异常检测算法 被引量:15

Hybrid outlier detection algorithm based on angle variance for high-dimensional data
下载PDF
导出
摘要 异常检测一直是数据挖掘领域的重要工作之一。基于欧氏距离的异常检测算法在应用于高维数据时存在检测精度无法保证和运行时间过长的问题。在基于角度方差的异常检测算法基础上,提出了一种多层次的高维数据异常检测算法(hybrid outlier detection algorithm based on angle variance for high-dimensional data,HODA)。算法结合了粗糙集理论,分析属性之间的相互作用以排除影响较小的属性;通过分析各维度上的数据分布,对数据进行网格划分,寻找可能存在异常点的网格;最后对可能存在异常点的网格计算角度方差异常因子,筛选异常数据。实验结果表明,与ABOD、Fast VOA和经典LOF算法相比,HODA算法在保证精测精度的前提下,运行时间显著缩短,且可扩展性强。 Outlier detection is a major task of data mining. Outlier detection methods based on Euclidean distances are not ca- pable for high-dimensional data because they can hardly ensure the cost of the computation and the accuracy. After analyzing angle-based outlier detection method, this paper proposed a novel approach called hybrid outlier detection algorithm based .on angle variance for high-dimensional data. The algorithm first utilized rough set theory to analyze the impact between the attri- butes and abandoned less important ones. Then it divided data into different cubes according to the distribution of data on every attribute. It only focused on the cubes with high possibility to contain outliers. At last, through the calculation of angle- based outlier factor, it was able to detect outliers. Compared to conventional algorithms, such as ABOD, FastVOA and LOF, the experimental results verify the feasibility of the proposed approach in terms of both efficiency and accuracy.
出处 《计算机应用研究》 CSCD 北大核心 2016年第11期3383-3386,共4页 Application Research of Computers
基金 中国民航大学中国民航信息技术科研基地资质项目(CCAC-ITRB-201301)
关键词 高维数据 异常检测 降维 网格 角度方差 high-dimensional data outlier detection dimensional reduction grid angle variance
  • 相关文献

参考文献10

  • 1周东华,魏慕恒,司小胜.工业过程异常检测、寿命预测与维修决策的研究进展[J].自动化学报,2013,39(6):711-722. 被引量:90
  • 2Ding Jie, Wang Lei, Shen DEan, et al. An anomaly detection system on big data[ J]. Natural Science Journal of Hainan University, 2015(1).
  • 3KDD Cup99 [ EB/OL]. http ://kdd. its. uci. edu/.
  • 4Kriegel H P, Schubert M, Zimek A. Angle-based outlier detection in high-dimensional data[ C ]//Proe of the 14th ACM SiGKDD Interna- tional Conference on Knowledge Discovery and Data Mining. 2008: 444-452.
  • 5Kim G, Lee S, Kim S. A novel hybrid intrusion detection method in- tegrating anomaly detection with misuse detection [ J ]. Expert Sys- tems with Applications, 2014, 41 (4) :1690-1700.
  • 6吴志远,钟培华,胡建根.程度多粒度粗糙集[J].模糊系统与数学,2014,28(3):165-172. 被引量:20
  • 7Feldman D, Schmidt M, Sohler C. Turning big data into tiny data: constant-size coresets for K-means, PCA and projective clustering [ C]// Procs of the 24th Annum ACM-SIAM Symposium on Discrete Algorithms. New York : ACM Press, 2013 : 1434-1453.
  • 8Yu Li, Lan Zhiling. A scalable, non-parametric anomaly detection framework for Hadoop [ C ]// Proc of ACM Cloud and Autonomic Computing Conference. New York: ACM Press, 2013.
  • 9黄红伟,黄天民.基于网格相对密度差的扩展聚类算法[J].计算机应用研究,2014,31(6):1702-1705. 被引量:12
  • 10Hart Jiawei, Kamber M, Pei Jian. Data mining: concepts and tech- niques [ M]. 3rd ed. San Francisco: Morgan Kanfmann, 2011.

二级参考文献25

  • 1樊红东,胡昌华,陈茂银,周东华.基于退化数据的最优预测维护决策支持方法[J].华中科技大学学报(自然科学版),2009,37(S1):45-48. 被引量:7
  • 2张贤勇,莫智文.变精度粗糙集[J].模式识别与人工智能,2004,17(2):151-155. 被引量:43
  • 3邱保志,沈钧毅.基于扩展和网格的多密度聚类算法[J].控制与决策,2006,21(9):1011-1014. 被引量:25
  • 4Pawlak Z. Rough sets[J]. International Journal of Computer and Information Science, 1982,11 (5): 341- 356.
  • 5Pawlak Z. Rough sets: Theoretical aspects of reasoning about date[M]. Boston:Kluwer Academic Publishers, 1991.
  • 6ZiarkoW. Variable precision rough set model[J]. Journal of Computer and System Sciences, 1993,46:39-59.
  • 7Yao Y Y, Lin T Y. Generalization of rough sets using modal logics[J]. Intelligent Automation and Soft Computing: An International Journal, 1996,2 (2) : 103- 120.
  • 8Dubois D,Prade H. Rough fuzzy sets and fuzzy rough sets[J]. International Journal of General Systems, 1990,17 (2):191-209.
  • 9Qian Y H, Liang J Y, Dang C Y. Incomplete multi granulation rough set[J]. IEEE Transactions on Systems,Man and Cybernetics, Part A, 2010,40 (2) : 420- 431.
  • 10Qian Y H, et al. MGRS:A multi-granulation rough set[J]. Information Sciences, 2010,180(6) : 949-970.

共引文献119

同被引文献125

引证文献15

二级引证文献60

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部