摘要
在分析基于计数的流频繁项挖掘算法的优缺点后,针对网络流的实际特性,提出了基于散列方法和计数方法的网络流频繁项挖掘(CBFTSFIM)算法.算法首先采用改进的计数型布鲁姆过滤器(CBF)在不用保存网络流信息的情况下过滤掉部分非频繁项流,使得需要进一步处理的流数目大为减少;然后采用基于时间和流长约束的频繁项挖掘(TSFIM)算法实现流频繁项提取.实际流量数据测试表明:CBFTSFIM算法具有非常高的空间利用率,其在流频繁项提取、流长统计效果上明显优于空间节约计数(SS)等算法.
The advantage and deficiency of counting method for frequent items mining over data streams were discussed at first. Then, an efficient frequent items mining algorithm CBF-TSFIM (counting blooming filter and time-space based frequent items mining) over network flows was pro- posed based on the combination of hash method and counting method according to the property of net- work flows. The algorithm CBF_ TSFIM improved the counting blooming filter (CBF) to filter some infrequent items and used TSFIM (time-space based frequent items mining) to identify frequent items. The experiment over real network traffic shows that CBF_ TSFIM is very space-saving and much more accurate than other algorithms like SS (space saving) in the criterion of frequent items identifying and flow length counting.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2013年第9期57-62,共6页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
陕西省自然科学基金资助项目(2012JZ8005)