期刊文献+

改进的SOD孤立点检测算法

Improved SOD Outlier Detection Algorithm
下载PDF
导出
摘要 针对传统SOD孤立点检测算法在处理高维数据时存在的问题,提出一种改进算法。通过对每一维的聚集度进行量化,确定各维的参考价值,从而降低算法结果对参数设定的敏感度,利用相对距离表示各点到中心值的偏离度,使其更利于不同密度子空间的孤立点检测。仿真实验结果表明,改进算法的检测精度优于传统SOD算法。 Aiming at the problems in process of dealing with high dimensional data for traditional SOD outlier detection algorithm,this paper presents an improved one.Through quantifying the aggregation of each dimension,the reference value of each dimension can be fixed,thus reducing the parameter settings impact on algorithm results.Using the relative distance to show the degree of deviation is convenient for detecting outlier in different densities subspace.Simulation results demonstrate the improved algorithm is better than traditional one in detection accuracy.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第9期93-94,97,共3页 Computer Engineering
基金 河北省重大技术创新基金资助项目"河北省港口群生产管理集成信息系统"(09213562Z)
关键词 高维数据 子空间 孤立点检测 数据挖掘 high dimensional data subspace outlier detection data mining
  • 相关文献

参考文献7

  • 1谭庆,张瑞玲.基于局部偏离因子的孤立点检测算法[J].计算机工程,2008,34(17):59-61. 被引量:5
  • 2Hans P K,Matthias S,Arthur Z.Angle-based Outlier Detection in High-dimensional Data[C]//Proc.of KDD'08,Las Vegas,Nevada,USA:[s.n.],2008.
  • 3Christian B,Katrin H,Nikola S M,et al.CoCo:Coding Cost for Parameter-free Outlier Detection[C]//Proc.of KDD'09.Paris,France:[s.n.],2009.
  • 4Ankur A.Local Subspace Based Outlier Detection[C]//Proc.of the 2nd International Conference on Communications in Computer and Information Science.Noida,India:[s.n.],2009.
  • 5Ye Mao,Li Xue,Maria E O.Projected Outlier Detection in High-dimensional Mixed-attributes Data Set[J].Expert Systems with Applications,2009,36(3):7104-7113.
  • 6Hans P K,Peer K,Erich S,et al.Outlier Detection in Axis-parallel Subspaces of High Dimensional Data[C]//Proc.of the 13th PAKDD'09.Bangkok,Thailand:[s.n.],2009.
  • 7雷小锋,谢昆青,林帆,夏征义.一种基于K-Means局部最优性的高效聚类算法[J].软件学报,2008,19(7):1683-1692. 被引量:114

二级参考文献14

  • 1Han JW, Kamber M. Data Mining: Concepts and Techniques. 2nd ed., San Francisco: Morgan Kaufmann Publishers, 2001. 223-250.
  • 2Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial database with noise. In: Simoudis E, Han J, Fayyad UM, eds. Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996. 226-231.
  • 3Zhang T, Ramakrishnan R, Linvy M. BIRCH: An efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Montreal: ACM Press, 1996. 103-114.
  • 4Guha S, RastogiR, Shim K. CURE: An efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 1998. 73-84.
  • 5Ankerst M, Breuning M, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. In: Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Philadelphia: ACM Press, 1999. 49-60.
  • 6Karypis G, Han EH, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer, 1999,32(8): 68-75.
  • 7Hand DJ, Vinciotti V. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters, 2003,24(9): 1555-1562.
  • 8Stonebraker M, Frew J, Gardels K, Meredith J. The SEQUOIA 2000 storage benchmark. In: Buneman P, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Washington: ACM Press, 1993.2-11.
  • 9Lazarevic A, Srivastava J, Kumar V. PAKDD 2004 Tutorial: Data Mining for Analysis of RareEvents[EB/OL]. (2004-03-26). http://www.deakin.edu.au/-pakdd04/pdf/Tutorial2.pdf.
  • 10Hawkins D. Identification of Outliers[M]. London, England: Chapman and Hall, 1980.

共引文献117

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部