
一种基于密度近邻的增量式孤立点发现算法 被引量:3

A Density-Neighbors-Based Incremental Outlier Detection Algorithm
摘要 为了解决数据集更新时孤立点增量发现问题,提出一种基于密度近邻的增量式孤立点发现算法.当数据集更新时,该算法在确定出受影响的对象后,根据对象和其近邻间k-密度变化,建立对象的密度近邻序列.然后依据对象的密度近邻序列代价和其k-距离邻域的平均密度近邻序列代价,计算出受影响对象的增量异常因子(IOF)来表征对象的孤立程度,从而提高增量孤立点发现的效果.此外,由于只需重新计算这些受影响对象的IOF值,该算法还提高孤立点发现的速度.实验表明,该算法不仅在孤立点增量发现的效果上高于以往算法且减少算法的运行时间. Aiming at the problem of incremental outlier detection with the dataset being updated, a density-neighbors-based incremental outlier detection algorithm is proposed. When the dataset is updated, the proposed algorithm identifies the affected objects and establishes the density neighbor sequences of the objects based on the change of the k-density of the object and those of its neighbors. According to the density neighbor sequence cost (DNSC) of the object and the average of the DNSC of k-distance neighbors of the object, the proposed algorithm calculates the incremental outlier factor(IOF) of each affected objects and the IOF value indicates the degree of the object as an outlier. Therefore, the proposed algorithm improves the effectiveness of incremental outlier detection. Moreover, it speeds up the outlier detection since the proposed algorithm recalculates the IOF values of these affected objects. The experimental results show that the proposed algorithm has a higher quality in outlier detection than the former incremental algorithms with the decrease of the running time.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2009年第6期931-935,共5页 Pattern Recognition and Artificial Intelligence
基金 国家863计划资助项目(No.2006AA04Z180)
关键词 孤立点发现 增量式算法 密度近邻 增量异常因子(IOF) Outlier Detection, Incremental Algorithm, Density Neighbor, Incremental Outlier Factor ([OF)
  • 相关文献


  • 1Tan Pangning, Steinbach M, Kumar V. Introduction to Data Mining. Milano, Italy : Addison Wesley Higher Education, 2006 : 491 - 509.
  • 2Domingos P, Hulten G. A General Framework for Mining Massive Data Streams. Journal of Computational and Graphical Statistics, 2003, 12 (4) : 945 -949.
  • 3Takeuchi J, Yamanishi K. A Unifying Framework for Detecting Outliers and Change Points from Time Series. IEEE Trans on Knowledge and Data Engineering, 2006, 18(4) : 482 -492.
  • 4单世民,邓贵仕,何英昊.数据流中孤立点识别方法[J].计算机工程,2007,33(15):172-174. 被引量:4
  • 5Dong Yihong, Tai Xiaoying, Zhao Jieyu. A Novel Fuzzy-Connectedness-Based Incremental Clustering Algorithm for Large Databases// Proc of the 2nd International Conference on Fuzzy Systems and Knowledge Discovery. Changsha, China, 2005:470 -474.
  • 6Kong Qinglu, Zhu Qiuming. Incremental Procedures for Partitioning Highly Intermixed Multi-Class Datasets into Hyper-Spherical and Hyper-Ellipsoidal Clusters. Data & Knowledge Engineering, 2007, 63(2) : 457 -477.
  • 7Breunig M M, Kriegel H P, Ng R T, et al. LOF : Identifying Density-Based Local Outliers// Proc of the ACM SIGMOD International Conference on Management of Data. Dallas, USA, 2000:93 - 104.
  • 8杨风召,朱扬勇,施伯乐.IncLOF:动态环境下局部异常的增量挖掘算法[J].计算机研究与发展,2004,41(3):477-484. 被引量:33
  • 9Pokrajac D, Lazarevic A, Lateeki L J. Incremental Local Outlier Detection for Data Streams//Proc of the IEEE Symposium on Computational Intelligence and Data Mining. Honolulu, USA, 2007: 504 -515.
  • 10Tang Jian, Chen Zhixiang, Fu A W, et al. Capabilities of Outlier Detection Schemes in Large Datasets, Framework and Methodologies. Knowledge and Information Systems, 2006, 11 ( 1 ) : 45 - 84.


  • 1熊家军,陈新,李庆华.一种启发式的入侵检测警报概念聚类算法[J].计算机工程,2005,31(7):35-36. 被引量:2
  • 2蒋盛益,李庆华,李新.数据流挖掘算法研究综述[J].计算机工程与设计,2005,26(5):1130-1132. 被引量:21
  • 3卢辉斌,徐刚,李段.一种基于孤立点检测的入侵检测方法[J].微机发展,2005,15(6):93-94. 被引量:3
  • 4单世民,邓贵仕.动态环境下一种改进的自适应微粒群算法[J].系统工程理论与实践,2006,26(3):39-44. 被引量:16
  • 5HanJiawei MichelineKambe.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 6D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980
  • 7V Barnett, T Lewis. Outliers in Statistical Data. New York: John Wiley, 1994
  • 8E Knorr, R Ng. Algorithms for mining distance-based outliers in large data sets. The 24th Int'l Conf on Very Large Data Bases. New York, 1998
  • 9S Ramaswamy, R Rastogi, K Shim. Efficient algorithms for mining outliers from large data sets. The ACM SIGMOD 2000 Int'l Conf on Management of Data, Dalles, TX, 2000
  • 10R Agrawal, P Ragaran. A linear method for deviation detection in large databases. In: Proc of the 2nd Int'l Conf on Knowledge Discovery and Data Mining. Portland, OR: AAAI Press, 1996. 164~169












使用帮助 返回顶部