期刊文献+

基于衰减模型的混合属性数据流离群检测 被引量:1

Outlier Detection Based on the Damped Model in Mixed Data Streams
下载PDF
导出
摘要 数据流离群检测因内存容量限制和实时检测需求而成为离群检测的一个难点。介绍了一种快速混合属性数据流离群检测算法。在衰减模型下增量聚类数据流,生成代表数据分布的聚类特征集合,半径阈值动态变化;当接收到检测请求时,计算满足条件的每个簇的离群因子,具有高离群因子的簇作为结果输出。同时提出了一种可有效区分离群簇与数据进化初始阶段的方法。算法的时间与空间复杂度同数据流规模近似成线性关系,在真实数据集上的实验结果显示,该算法可有效检测混合属性数据流中的离群点。 Outlier detection in data streams poses great challenges due to the limited memory availability and real time detection rectuirement. A fast outlier detection algorithm in mixed data streams was introduced by clustering the data streams incrementally based on the damped model and generating the cluster features on behalf of the data distribution.The radius threshold value changed dynamically. When detection requirement was received the outlier factor of specified clusters was calculated and the clusters with high outlier factor were taken as the abnormal clusters. At the same time the method is proposed to distinguish between the abnormal cluster and the initial stage of data evolution. The complexity of the time and space were nearly linear with the size of data streams. The experimental results on the KDDCUP99 dataset demonstrate that the method can effectively detect the outliers in mixed data streams.
出处 《计算机科学》 CSCD 北大核心 2010年第5期157-161,共5页 Computer Science
基金 国家863高技术研究发展计划(2006AA01A120) 国家自然科学基金(10871040)资助
关键词 混合属性 数据流 增量聚类 离群检测 衰减模型 Mixed attribute Data streams Incremental clustering Outlier detection Damped model
  • 相关文献

参考文献11

  • 1Aggarwal C C, Han Jia-wei, Wang Jian-yong, et al. A Framework for Clustering Evolving Data Streams[C]//Proceedings of the 29th International Conference on Very Large Data Bases. Berlin, 2003 : 81-92.
  • 2Aggarwal C C, Han Jia-wei, Wang Jian-yong, et al. A Framework for Projected Clustering of High Dimensional Data Streams[C]//Proceedings of the 30th International Conference on Very Large Data Bases. Toronto,2004:852-863.
  • 3Cao Feng,Ester M, Qian Wei-ning, et al. Density-based Clustering over an Evolving Data Stream with Noise[C]//Proceedings of the 6th SIAM International Conference on Data Mining. Be thesda, 2006: 326-337.
  • 4倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的数据流离群点检测算法[J].计算机研究与发展,2006,43(9):1639-1643. 被引量:20
  • 5杨宜东,孙志挥,朱玉全,杨明,张柏礼.基于动态网格的数据流离群点快速检测算法[J].软件学报,2006,17(8):1796-1803. 被引量:22
  • 6俞研,郭山清,黄皓.基于数据流的异常入侵检测[J].计算机科学,2007,34(5):66-71. 被引量:11
  • 7周晓云,孙志挥,张柏礼,杨宜东.高维类别属性数据流离群点快速检测算法[J].软件学报,2007,18(4):933-942. 被引量:21
  • 8Jiang Sheng-Yi, Song Xiao-Yu. A Clustering-based Method for Unsupervised Intrusion Deteetions[J]. Pattern Recognition Letters, 2006,27(5) : 802-810.
  • 9杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,30(8):1364-1371. 被引量:22
  • 10He Zeng-you, Xu Xiao-fei, Huang Zhe-xue, et al. FP-Outlier: Frequent Pattern Based Outlier Detection[J]. Computer Science and Information System,2005,2(1): 103-118.

二级参考文献43

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2E Knorr, R Ng. Algorithms for mining distance-based outliers in large datasets [C]. The 24th Conf on VLDB, New York,NY, 1998
  • 3M M Breunig, H P Kreigel, R T Ng, et al. LOF: Identifying density-based local outliers [C]. The ACM SIGMOD, Dallas,TX, 2000
  • 4M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction [C]. The ACM SIGMOD Int'l Conf on Management of Data, Santa Barbara, CA, 2001
  • 5D Hawkins. Identification of Outliers [M]. London: Chapman and Hall, 1980. 1-45
  • 6S Guha, N Mishra, R Motwani, et al. Clustering data streams[C]. In: Proc of the Annual Syrup on Foundations of Computer Science, 2000. 359- 366. http://citeseer. ist. psu. edu/guha00clustering.html
  • 7Mishra, Adam Meyerson, Sudipto Guha, et al. Streaming-data algorithms for high-quality clustering [C]. In: Proc of IEEE Int'l Conf on Data Engineering, 2002. http://citeseer. ist. psu.edu/497671. html
  • 8J Han, M Kamber. Data Mining [M]. New York: Morgan Kaufmann, 2001. 1-321
  • 9S Robertson, E Siegel, M Miller, et al. Surveillance detection in high bandwidth environments [OL]. http://wwwl. cs.columbia.edu/ids/publications/SD-DiscexⅢ. pdf, 2003
  • 10M Mahoney. Network traffic anomaly detection based on packet bytes [C]. The 2003 ACM Symp on Applied Computing,Melbourne, Florida, 2003

共引文献86

同被引文献17

  • 1郑军,胡铭曾,云晓春,郑仲.基于数据流方法的大规模网络异常发现[J].通信学报,2006,27(2):1-8. 被引量:17
  • 2俞研,郭山清,黄皓.基于数据流的异常入侵检测[J].计算机科学,2007,34(5):66-71. 被引量:11
  • 3Anyanwu L O,Keengwe J,Arome G A.Scalable intrusion detectionwith recurrent neural networks[C]∥Proc.of the ITNG2010-7th International Conference on Information Technology:NewGenerations,2010:919-923.
  • 4Patel A,Qassim Q,Wills C.A survey of intrusion detection andprevention systems[J].Information Management and ComputerSecurity,2010,18(4):277-290.
  • 5Farran B,Saunders C,Niranian M.Machine learning for intru-sion detection:modeling the distribution shift[C]∥Proc.of theIEEE International Workshop on Machine Learning for SignalProcessing,2010:232-237.
  • 6Lee W K,Stolfo S J,Mok K W.A data mining framework forbuilding intrusion detection models[C]∥Proc.of the IEEEComputer Society Symposium on Research in Security and Pri-vacy,1999:120-132.
  • 7Lee W K,Stolfo S J.A framework for constructing features andmodels for intrusion detection systems[J].ACM Trans.onInformation and System Security,2000,3(4):227-261.
  • 8Ektefa M,Memar S,Sidi F,et al.Intrusion detection usingdata mining techniques[C]∥Proc.of the International Confer-ence on Information Retrieval and Knowledge Management:Exploring the Invisible World,2010:200-203.
  • 9Gudadhe M,Prasad P,Wankhade K.A new data mining basednetwork intrusion detection model[C]∥Proc.of the Interna-tional Conference on Computer and Communication Technology,2010:731-735.
  • 10Aggarwal C C,Han J W,Wang J Y,et al.A framework forclustering evolving data streams[C]∥Proc.of the 29th Interna-tional Conference on Very Large Data Bases,2003:81-92.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部