期刊文献+

屏蔽输入参数敏感的异常点检测新方法 被引量:2

New Approach to Detect Outlier which is Insensitive to Input Parameter
下载PDF
导出
摘要 大多数基于密度的异常点检测算法需要设置两个输入参数,并对输入参数很敏感,用户设置不正确会导致算法不能发现所有有意义的异常点,甚至是发现错误的异常点,这使得评价一个数据挖掘算法的"3-E"标准中"易于使用"这一点不能得到满足。为此,首先根据对象的邻域、反邻域和局部密度构造基于邻域的局部密度因子NLDF,NLDF可指示异常点的异常程度,然后提出一种屏蔽输入参数敏感的异常点检测算法ODINP。ODINP的一个非常显著的优点就是只需要一个参数k并且对k不敏感。该算法在保持已有基于密度的异常点检测算法高效性的同时,具有很高的异常点检测精度。大规模、任意形状和高维数据集的测试结果表明该算法是有效的、可行的。 Most density-based outlier detection algorithms require the setting of two input parameters and are sensitive to input parameters. Incorrect setting may cause an algorithm to fail in finding all meaningful outliers and even find wrong outliers, which cannot satisfy the easy to use of "3-E" criteria. Therefor, constructed neighborhood based local density factor NLDF taking account of neighborhood, reverse neighborhood and local density, NLDF can denote the de- gree of outlierness of an object. Afterward,an novel outlier detection algorithm named ODINP that insensitive to input parameter was proposed. ODINP keeps the efficiency of the existing density-based outlier detection algorithms and owns high precision. Just a parameter k and insensitive to k is a significantly advantage of ODINP. Extensive experiments on large-scale,different shape and high-dimensional data sets demonstrated that the algorithm is effective and feasible.
出处 《计算机科学》 CSCD 北大核心 2008年第12期192-195,206,共5页 Computer Science
基金 国家高技术研究发展计划(863计划)项目(2007AA01Z404)资助
关键词 数据挖掘 异常点检测 参数 邻域 密度 Data mining, Outlier detection, Parameter, Neighborhood,Density
  • 相关文献

参考文献14

  • 1Breunig M M, Kriegel H-P, Ng R T, et al. LOF: Identifying Density-Based Local Outliers///Proc. 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas, TX, 2000 : 93-104
  • 2黄添强,秦小麟,叶飞跃.基于方形邻域的离群点查找新方法[J].控制与决策,2006,21(5):541-545. 被引量:16
  • 3Ester M, Kriegel H- P, Sander J, et al. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise//Proc. 2nd ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Portland, Oregon, 1996: 226-231
  • 4Zhou S, Zhao Y, Guan J, et al. A Neighborhood-Based Clustering Algorithm//Proc. 9th Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Hanoi,Vietnam, 2005:361-371
  • 5Ankerst M, Breunig M, Kriegel H - P, et al. Optics : Ordering Points to Identify the Clustering Structure//Proc. 1999 ACM SIGMOD Int'l Conf.on Management of Data. Philadelphia, PA, 1999:49-60
  • 6蔡颖琨,谢昆青,马修军.屏蔽了输入参数敏感性的DBSCAN改进算法[J].北京大学学报(自然科学版),2004,40(3):480-486. 被引量:39
  • 7Keogh E,Lonardi S,Ratanamahatana C A. Towards Parameterfree Data Mining//Proc. 2004 ACM SIGKDD Int'1 Conf. on Knowledge Discovery and Data Mining. Washington, USA, 2004 : 206-215
  • 8Hawkins D M. Identification of Outliers[M]. London:Chapman and Hall, 1980
  • 9Knorr E M, Ng R T. Algorithm for Mining Distance-Based Outliers in Large Datasets//Proc. of the 24th Int'l Conf. on Very Large Database. New York, USA, 1998: 392-403
  • 10Jin W, Tung A K H, Han J, et al. Ranking Outliers Using Symmetric Neighborhood Relationship// Proc. 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Singapore, 2006 : 93-104

二级参考文献20

  • 1周水庚,周傲英,金文,范晔,钱卫宁.FDBSCAN:一种快速 DBSCAN算法(英文)[J].软件学报,2000,11(6):735-744. 被引量:42
  • 2He Z,Xu X,Deng S.Discovering Cluster-based Local Outliers[J].Pattern Recognition Letters,2003,24(9-10):1642-1650.
  • 3He Z,Xu X,Huang J Z,et al.Mining Class Outliers:Concepts,Algorithms and Applications in CRM[J].Expert Systems with Applications,2004,27(4):681-697.
  • 4Breunig M M,Kriegel H P,Ng R T,et al.LOF:Identifying Density-based Local Outliers[A].Proc of SIGMOD'00[C].Dallas,2000:427-438.
  • 5Ester M,Kriegel H P,Sander J,et al.A Densitybased Algorithm for Discovering Clusters in Large Spatial Databases[A].Proc of KDD'96[C].Portland OR,1996:226-231.
  • 6Barnett V,Lewis T.Outliers in Statistical Data[M].New York:John Wiley,1994.
  • 7Hawkins D M.Identification of Outliers[M].London:Chapman and Hall,1980.
  • 8Rousseeuw P J,Leroy A M.Robust Regression and Outlier Detection[M].New York:John Wiley and Sons,1987.
  • 9Johnson T,Kwok I,Ng R T.Fast Computation of 2-dimensional Depth Contours[A].Proc KDD[C].New York:AAAI Press,1998:224-228.
  • 10Knorr E,Ng R.A Unified Notion of Outliers:Properties and Computation[A].Proc of the Int Conf on Knowledge Discovery and Data Mining[C].New York:AAAI Press,1997:219-222.

共引文献53

同被引文献22

  • 1王宏鼎,童云海,谭少华,唐世渭,杨冬青.异常点挖掘研究进展[J].智能系统学报,2006,1(1):67-73. 被引量:22
  • 2刘晓艳 王丽珍 杨志强 陈红梅.基于数学形态学的模糊异常点检测.计算机研究与发展,2009,.
  • 3R.Engle. Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of U. K. Inflation[J]. Econometrica,1982,50(4).
  • 4T.Bollerslev. Generalized Autoregressive Conditional Heteroskedasticity[J]. Journal of Economics,1986,31 (3).
  • 5R. F. Engle, D. Lilien, R. P. Robins. Estimating Time Varying Risk Premia in the Term Structure: The ARCH-M Model [J]. Econometrica, 1987,55(2).
  • 6Aurea Granea, Helena Veiga. Wavelet Based Detection of Outliers in Financial time Series[J]. Computational Statistics and Data Analysis, 2010,54(11).
  • 7D.Pena,F.Prieto.Multivariate Outlierdetection and Robust Covariante Matrix Estimation[J].Technometrics, 2001, 43(3).
  • 8X. Zhang,M.King.Influence in Generalized Autoregressive Conditional Heteroscedasticity Processes[J].Journal of Business & Economic Statistics,2005,118- 129.
  • 9高铁梅.计量经济学建模与教程第二版[M].北京:清华大学出版社.2009.
  • 10傅强,彭选华,毛一波.金融时间序列变点探测的小波模极大值线方法[J].重庆大学学报(自然科学版),2007,30(8):140-144. 被引量:9

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部