期刊文献+

基于相似孤立系数的孤立点检测算法 被引量:4

Outlier Detection Algorithm Based on Approximate Outlier Factor
下载PDF
导出
摘要 基于聚类的孤立点检测算法得到的结果比较粗糙,不够准确。针对该问题,提出一种基于相似孤立系数的孤立点检测算法。定义相似距离以及相似孤立点系数,给出基于相似距离的剪枝策略,根据该策略缩小可疑孤立点候选集,并降低孤立点检测算法的计算复杂度。通过选用公共数据集Iris、Labor和Segment-test进行实验验证,结果表明,该算法在发现孤立点、缩小候选集等方面相比经典孤立点检测算法更有效。 Aiming at the problem that the result of outlier detection algorithm based on clustering is coarser and not very accurate, this paper proposes an outlier detection algorithm based on Approximate Outlier Factor(AOF). This algorithm presents the definition of the similarity distance and outlier similarity coefficient, and provides a pruning strategy based on similarity distance to reduce the suspect candidate sets to decrease the computational complexity. Experiments are carried out with public datasets Iris, Labor and Segment-test, and results show that the performance of detecting outlier and reducing candidate set of this algorithm is effective compared with the classical outlier detection algorithm.
出处 《计算机工程》 CAS CSCD 2013年第11期200-204,共5页 Computer Engineering
基金 国家科技支撑计划基金资助项目(2012BAH08B01) 湖南省自然科学基金资助项目(12JJ3074)
关键词 聚类孤立点 孤立点检测 相似孤立系数 剪枝策略 孤立点候选集 clustering outlier outlier detection Approximate Outlier Factor(AOF) pruning strategy outlier candidate set
  • 相关文献

参考文献79

  • 1薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 2Breuning M M, Kriegel H P, Ng R T. LOF: Identifying Density-based Local Outlier[C]//Proc. of ACM SIGMOD International Conference on Management of Data. New York, USA: ACM Press, 2000.
  • 3Zhang Yue, Yang Xuehua, Li Huang. An Outlier Mining Algorithm Based on Confidence Interval[C]//Proc. of the 2nd IEEE International Conference on Information Management and Engineering[S. l.]: IEEE Press, 2010.
  • 4Knorr E M, Ng R T. Finding Intentional Knowledge of Distance-based Outliers[C]//Proc. of the 25th International Conference on Very Large Data Bases. Edinburgh, UK: [s. n.], 1999.
  • 5Wei Huang, Wu Di, Ren Jiadong. An Outlier Mining Algori- thm in High-dimention Based on Single-parament-k Local Density[C]//Proc. of the 4th International Conference on Innovative Computing[S. l.]: IEEE Press, 2009.
  • 6李存华,孙志挥.GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展,2003,40(11):1586-1592. 被引量:28
  • 7University of California, Irvine. UCI Machine Learning Repo- sitory[EB/OL]. (2010-11-21). http://archive.ics.uci.edu/ml/ datasets.
  • 8刘洪涛,童德利,陈世福.一种基于属性的异常点检测算法[J].计算机科学,2005,32(5):164-166. 被引量:4
  • 9Ren Jiadong, Wu Qunhui, Zhang Jia. Efficient Outlier Detec- tion Algorithm for Heterogeneous Data Streams[C]//Proc. of the 6th International Conference on Fuzzy Systems and Knowledge Discovery. Tianjin, China: [s. n.], 2009.
  • 10张长,邱保志.LDC-mine——基于局部偏差系数的孤立点挖掘算法[J].计算机应用,2007,27(1):95-97. 被引量:3

二级参考文献43

  • 1D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
  • 2T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours. In: Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 224-228.
  • 3E M Knorr, R T Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc of the 24th Int'l Conf on Very Large Databases. New York: Morgan Kaufmann, 1998. 392-403.
  • 4D Yu, G Sheikholeslami, A Zhang. Findout: Finding outliers in very large datasets. Department of Computer Science and Engineering, State University of New York at Buffalo, Tech Rep:99-03, 1999. http://www. cse. buffalo. edu/tech-reports.
  • 5M Breunig, H Kriegel, R T Ng et al. LOF: Identifying densitybased local outliers. In: Proc of ACM SIGMOD Int'l Cortf on Management of Data. Dallas, Texas: ACM Press, 2000. 93-104.
  • 6M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction. In: Proc of ACM SIGMOD Int'l Conf on Management of Data. Santa Barbara, CA: ACM Press, 2001. 91-102.
  • 7H Samet. The Design and Analysis of Spatial Data Structures.Boston, MA: Addison-Wesley, 1990.
  • 8HanJiawei KamberM.Data Mining Concept and Technique[M].北京:高等教育出版社,2001..
  • 9Witten Ian H, Frank Eibe. Data mining:practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, 1999
  • 10Collections of datasets. http://www. cs. waikato. ac. nz/ml/weka/

共引文献126

同被引文献39

  • 1陆声链,林士敏.基于距离的孤立点检测及其应用[J].计算机与数字工程,2004,32(5):94-97. 被引量:23
  • 2焦誉,傅为忠.基于距离的孤立点挖掘在CRM上的应用[J].华东经济管理,2007,21(6):67-69. 被引量:2
  • 3薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 4余伟峰,钱夕元.基于KNN图的两阶段孤立点检测及应用研究[J].计算机工程与应用,2008,44(2):186-189. 被引量:1
  • 5Aggarwal C C,Yu P S.Outlier detection for high dimensionaldata[C].Proc of ACM International ConferenceManagement of Data.New York,USA:ACM Press,2001.
  • 6Ester M,Kriegel H P,Sander J,et al.A density-basedalgorithm for discovering clusters in large spatial databaseswith noise[C].Proc 2nd Int Conf on Knowledge Discoveryand Data Mining(KDD-96).Portland:ACM Press,1996:226-231.
  • 7Daszykowski M,Walczak B,Massart D L.Looking fornatural patterns in data[J].Chemometrics and Intelligent Laboratory Systems,2001,56(2):83-92.
  • 8Hawkins D.Identification of outliers[M].London:Chapmanand Hall,1980.
  • 9Knorr E M,Ng R T,Tucakov V.Distance-based outliers:algorithms and applications[J].VLDB Journal:Very LargeDatabases,2000:237-253.
  • 10Ramaswamy S,Rastogi R,Shim K.Efficient algorithmsfor mining outliers from large data sets[C].Proceedingsof the ACM SIGMOD Conference,2000:437-438.

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部