摘要
针对相关算法在挖掘数据流最大频繁项集时所存在的问题,提出了一种基于向量的数据流滑动窗口中最大频繁项集挖掘算法。该算法首先用向量作为概要数据结构,采用定量更新滑动窗口策略解决时间粒度问题;其次通过位运算产生频繁项集,利用矩阵和数组存储辅助信息,深度优先搜索产生最大频繁项集时利用剪枝策略进一步减少挖掘时间;最后用索引链表存储挖掘结果以提高超集检测效率。理论分析和实验结果验证了该算法的有效性。
This paper proposed an algorithm based on vector for mining maximal frequent itemsets in sliding window over data streams(MFISW) aimed at the mining problems of maximal frequent itemsets over data streams.Firstly,the algorithm used vector to express items in data streams and solved the problem of time granularity through quantitative updating strategies.Secondly,it stored the ancillary information using a matrice and a array in creating the frequent sets through the bit operation,and improved the mining efficiency again using pruning technology during creating the maximal frequent sets.Finally,it improved the detecting efficiency by using a index list to store mining results.Theoretical analysis and experimental results show the algorithm is efficient.
出处
《计算机应用研究》
CSCD
北大核心
2012年第3期837-840,共4页
Application Research of Computers
基金
国家"863"计划资助项目(2007AA01Z443)
成都大学校基金资助项目(2010XJZ16)
关键词
数据流
最大频繁项集
滑动窗口
向量
data stream
maximal frequent itemsets
sliding window
vector