期刊文献+

ISAD:一种新的基于属性距离和的孤立点检测算法 被引量:5

An Algorithm for Outlier Detection Based on the Sum of Attribute Distances
下载PDF
导出
摘要 孤立点是数据对象在某些属性(维)上波动形成的。由此,本文提出了关键属性的概念,用于描述影响数据稳定性的属性。在真实数据集中,只有一部分属性是能够决定某数据是否是孤立点的关键属性。由此,本文提出了关键属性隶属度的定义及其求解算法,并在此基础上提出了一种新的基于属性距离和的孤立点检测算法。实验结果表明,该算法较基于单元的算法在效率及维数可扩展方面均有显著提高。 Outliers are the result that the data objects fluctuate in certain attributes(dimensions). Therefore, we present the concept of key attribute to describe the attributes that affect data stability. In real datasets, only a few attributes are key attributes that can determine whether the data is an outlier or not. Thus we present the definition of the degree of membership for the key attributes and its correlative algorithm. Moreover, a new algorithm based on the sum of attribute distances is designed for outlier detection. The experimental results show that the new algorithm is effective, and its efficiency increases obviously.
出处 《计算机工程与科学》 CSCD 北大核心 2009年第3期83-85,88,共4页 Computer Engineering & Science
基金 国家自然科学基金资助项目(60773100) 教育部科学技术研究重点资助项目(205014) 河北省教育厅科研计划资助项目(2006143)
关键词 孤立点 关键属性 隶属度 属性距离和 outlier key attribute degree of membership sum of attribute distances
  • 相关文献

参考文献10

  • 1范明 等.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 2Singh M K, Ahuja N. Mean-Shift Segmentation with Wavelet-Based Bandwidth Selection[C] //Proc of the 6th IEEE Workshop on Applications of Computer Vision, 2002:43-47.
  • 3Knorr E, Ng R. Algorithms for Mining Distance-Based Outliers in Large Datasets[C]//Proc of the VLDB Conf, 1998: 392-403.
  • 4Knorr E M, Ng R T, Tucakov V. Distance-Based Outliers:Algorithms and Applications[J]. VLDB Journal:Very Large Databases, 2000,8(3-4) :237- 253.
  • 5Arning A, Agrawal R, Raghavan P. A Linear Method for Deviation Detection in Large Databases[C] ffProe of 1996 Int'l Conf on Data Mining and Knowledge( Special Issue on High Performance Data Mining), 2000.
  • 6Yu D, Sheikholeslami G, Zhang A. FindOut: Finding Out Outliers in Large Datasets[J]. Knowledge and Information Systems, 2002,4(4) : 387-412.
  • 7He Z, Xu X, Deng S. Discovering Cluster Based Local Outliers [J].Pattern Recognition Letters, 2003,24 (9-10) : 1641- 1650.
  • 8Angiulli F, Pizzuti C. Fast Outlier Detection in High Dimensional Spaces[C]ffProc of PKDD'02,2002 : 25-36.
  • 9岳峰,邱保志.基于反向K近邻的孤立点检测算法[J].计算机工程与应用,2007,43(7):182-184. 被引量:8
  • 10陆声链,林士敏.基于距离的孤立点检测研究[J].计算机工程与应用,2004,40(33):73-75. 被引量:44

二级参考文献15

  • 1JiaweiHan MichelineKamber 范明 孟小峰 译.数据挖掘概念与技术[M].北京:机械工业出版社,2002..
  • 2范明 等.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 3E M Knorr,R T Ng,V Tucakov. Distance-Based Outliers :Algorithms and Applications[J].VLDB Journal:Very Large Databases,2000:237~253
  • 4S D Bay,M Schwabacher. Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule[C].In:SIGKDD '03, Washington, DC, USA ,2003
  • 5J Laurikkala,M Juhola,E Kentala. Informal Identification of Outliers in Medical Data[C].In :5th International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, (IDAMAP-2000) ,2000
  • 6K Yamanishi,J Takeuchi.A Unifying Framework for Detecting Oulliers and Change Points from Non-Stationary Time Series Data[C].In:SIGKDD '02 Edmonton,Alberta,Canda,2002
  • 7S Ramaswamy,R Rastogi,K Shim. Efficient Algorithms for Mining Outliers from Large Data Sets[C].In:Proceedings of the ACM SIGMOD Conference, 2000: 473~438
  • 8Wen Jin,K H Tung,Jiawei Han. Mining Top-n Local Outliers in Large Databases[C].In:KDD 2001 San Francisco,California USA
  • 9F Angiulli,C Pizzuti.Fast Outlier Detection in High Dimensional Spaces[C].In:Proccedings of the Sixth European Conference on the Principles of Data Mining and Knowledge Discovery,2002:15~16
  • 10NHL data.http://moo. Hawaii.edu: 1749/hockey/hockey.html

共引文献168

同被引文献15

  • 1陆声链,林士敏.基于距离的孤立点检测研究[J].计算机工程与应用,2004,40(33):73-75. 被引量:44
  • 2李强,李振东.数据挖掘中孤立点的分析研究在实践中应用[J].微计算机应用,2006,27(3):323-327. 被引量:9
  • 3Knorr E ,Ng R. Algorithms for Mining Distance- Based Outliers in Large Database[ J]. Proc of the VLDB Conf, 1998. 392 -403.
  • 4KNORR E M, NG R T. Algorithms for mining distance-based outliem in large datasets: proceedings of 24th VLDB Conference, New York, August 24 - 27,1998 [ C ]. San Fransisco: Morgan Kaufmann Publishers Inc. , 1998.
  • 5KNORR E M, NG R T, TUCAKOV V. Distance-based outliers:algorithms and applications [ J ]. VLDB Journal: Very Large Databases, 2000,8 (3/4) :237 - 253.
  • 6BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: Identifying density-based local outliers : proceedings of the 2000 ACM SIGMOD International Conference on Management of data, Dallas, May 16 - 18, 2000 [ C ]. New York: ACM, 2000.
  • 7KNORR E, NG R. Algorilhms for mining distance- based outliers in large datasets [ C ]. Proe of the VLDB Conf, 1998 : 392 - 403.
  • 8KNORR E M, NG R T, TUCAKOV V. Distance- based outliers : algorithms and applications[ J ] . VLDB Journal : Very Large Databases, 2000,8(3 - 4) :237 - 253.
  • 9BREUNIG M M, KRIEGEI, H P, NG R T, et al. LOF: identifying density - based local outliers[ C]. Proceedings of SIGMOD'00, Dallas, Texas, 2000:427-438.
  • 10施化吉,周书勇,李星毅,唐慧,丁秋林.基于平均密度的孤立点检测研究[J].电子科技大学学报,2007,36(6):1286-1288. 被引量:11

引证文献5

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部