摘要
滑动窗口是一种对最近一段时间内的数据进行挖掘的有效的技术,本文提出一种基于滑动窗口的流数据频繁项挖掘算法.算法采用了链表队列策略大大简化了算法,提高了挖掘的效率.对于给定的阈值S、误差ε和窗口长度n,算法可以检测在窗口内频度超过Sn的数据流频繁项,且使误差在εn以内.算法的空间复杂度为O(ε-1),对每个数据项的处理和查询时间均为O(1).在此基础上,我们还将该算法进行了扩展,可以通过参数的变化得到不同的流数据频繁项挖掘算法,使得算法的时间和空间复杂度之间得到调节.通过大量的实验证明,本文算法比其它类似算法具有更好的精度以及时间和空间效率.
The sliding window is an effective approach to mine frequent data itmes in the recent period of time.proposed an algorithm for mining frequent items in a stream based on sliding window.The algorithm adopts linked queue strategy which greatly improves the efficiency of the algorithm.Given a threshold S,an error bound ε and the length of the sliding window n,our algorithm can determinately detect the data items within the current window whose frequncy exceeds Sn with an error less than εn using O(ε-1) memory space,and the processing time for each data item and the query time are both O(1).Based on this algorithm,we have proposed a general framework for mining frequent items in data stream based on sliding window.Under this framework,different algorithms can be constructed by changing the parameters which could adjust the time complexity and space cost of the algorithm.Through extensive experiments,we show that our algorithm outperforms other methods in terms of the accuracy,memory requirement,and processing speed.
出处
《小型微型计算机系统》
CSCD
北大核心
2012年第5期940-949,共10页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61070047
61003180)资助
江苏省自然科学基金项目(BK2008206
BK2010311)资助
江苏省教育厅自然科学基金项目(09KJB20013)资助
江苏省信息融合软件工程技术研发中心基金项目(SR-2011-05)资助
江苏省普通高校研究生科研创新计划项目(CX08B_098Z)资助
关键词
数据流
频繁项
滑动窗口
data stream
frequent item
sliding window