期刊文献+

基于时间和流长约束的网络流频繁项挖掘算法 被引量:3

Frequent items mining algorithm over network flows based on time and flow length constraints
下载PDF
导出
摘要 基于计数的频繁项挖掘算法是目前数据流频繁项挖掘领域非常活跃的一种方法.在SS计数算法的启发下,针对网络流的实际特性,提出了一种剪枝操作受时间和流长双重约束的网络流频繁项挖掘算法TSFIM.算法采用三级缓存结构分别实现长流及时保护、基于时间的报文归并和基于流长的流项区分淘汰;通过理论分析了TSFIM算法的性能并探讨了算法适用于长时间情况下的约束条件和优势;最后通过实际流量数据测试表明,TSFIM算法具有非常高的空间利用率,算法在流频繁项提取、流长统计效果上明显优于SS等算法. The counting method is an important frequent items mining algorithm in data streams. Inspired by the SS counting method and based on the property of flows, a frequent items mining algorithm over network flows TSFIM was proposed, whose pruning strategy was subject to the constraints of time and flow length. The algorithm TSFIM adopted three-stage buffers to fulfill the functions of preserving large flows promptly, packets aggregation based on time and flow deletion based on length respectively. The performance of TSFIM was analyzed in theory, then the constraint and advantage of TSFIM at the long time situation were discussed. Finally, an experiment on real network traffic shows that TSFIM is very space-saving and much more accurate than other algorithms like SS in the metrics of frequent items mining and flow length counting.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2013年第10期790-798,共9页 JUSTC
基金 陕西省自然科学基金重点项目(2012JZ8005)资助
关键词 网络流 频繁项挖掘 计数算法 剪枝操作 network flows frequent items mining counting method pruning strategy
  • 相关文献

参考文献13

二级参考文献79

共引文献63

同被引文献29

  • 1刘殷雷,刘玉葆,陈程.不确定性数据流上频繁项集挖掘的有效算法[J].计算机研究与发展,2011,48(S3):1-7. 被引量:14
  • 2张玉,方滨兴,张永铮.高速网络监控中大流量对象的识别[J].中国科学:信息科学,2010,40(2):340-355. 被引量:11
  • 3龚俭,彭艳兵,杨望,刘卫江.基于BloomFilter的大规模异常TCP连接参数再现方法[J].软件学报,2006,17(3):434-444. 被引量:24
  • 4王风宇,云晓春,王晓峰,王勇.高速网络监控中大流量对象的提取[J].软件学报,2007,18(12):3060-3070. 被引量:22
  • 5Hyunsang C, Heejo L. Identifying botnets by capturing group activities in DNS traffic[J]. Computer Networks, 2012, 56(1): 20-33.
  • 6Estan C, Varghese G. New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice[J]. ACM Transactions on Computer Systems, 2003, 21(3): 270-313.
  • 7Manku G S, Motwani R. Approximate frequency counts over data streams[C]//Proc of the 28th International Conference on Very Large Data Bases, Hong Kong, 2002:346-357.
  • 8Cormode G, Muthukrishnan S. What's hot and what's not: tracking most frequent items dynamically[J]. ACM Transactions on Database Systems, 2005, 30(1): 249-278.
  • 9Karp R M, Shenker S, Papadimitriou C H. A simple algorithm for finding frequent elements in streams and bags[J]. ACM Transactions on Database Systems, 2003, 28(1): 51-55.
  • 10Metwally A, Agrawal D, Abbadi A E. Efficient computation of frequent and Top-k elements in data streams //Proc. of the International Conference on Data Theory. Edinburgh: Springer-Verlag, 2005:398-412.

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部