期刊文献+

滑动窗口下数据流完全加权最大频繁项集挖掘 被引量:2

Mining Full Weighted Maximal Frequent Itemsets Based on Sliding Window over Data Stream
下载PDF
导出
摘要 针对当前关于数据流加权最大频繁项集WMFI(weighted maximal frequent itemsets)的研究无法有效地处理频繁阈值和加权频繁阈值不一致情况下WMFI的挖掘问题,提出了完全加权最大频繁项集FWM FI(full w eighted maximal frequent itemsets)的概念.为了减少naive算法在处理滑动窗口下完全加权最大频繁项集挖掘时存在的冗余运算,提出了FWMFI-SW(FWMFI mining based on sliding window over data stream)算法.所提出的算法通过基于频繁约束条件的优化策略减少了naive算法中M ax W优化策略的无效调用次数;采用编辑距离比率作为WMFP-SW-tree的重构判别函数,可以有效减少该树的重构次数.实验结果表明FWMFI-SW算法是有效的,且比naive算法更有时间优势. Aiming at the problem that none of current researches on the WMFI ( weightedmaximal frequent itemsets) over data stream emphasizes the WMFI mining on the condition thatthe frequent threshold is not equal with the weighted frequent threshold, the concept of FWMFI(full weighted maximal frequent itemsets) was firstly promoted in this work. In order to reduceredundant operations existing in the naive algorithm which is used to handle the FWMFI miningbased on sliding window over data stream, the FWMFI - SW ( FWMFI mining based on slidingwindow over data stream) algorithm was proposed. The mining optimization strategy was adoptedbased on the frequent character to reduce the unnecessary call about the MaxW optimizationstrategy in the naive algorithm. In addition, the edit distance ratio was taken as reconstruction9udge function to decide whether the updated WMFP - SW - tree should be reconstructed as thewindow slides. The extensive experiments showed that the FWMFI - SW algorithm is effective,and outperforms the naive algorithm in running time.
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2016年第7期931-936,共6页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(60903159 61173153 61402096) 中央高校基本科研业务费专项资金资助项目(N110818001 N100218001 N130504007 N120104001) 沈阳市科技计划项目(1091176-1-00) 国家高技术研究发展计划项目(2015AA016005)
关键词 数据流 滑动窗口 编辑距离比率 加权最大频繁项集 重构判别函数 data stream sliding window edit distance ratio weighted maximal frequentitemsets reconstruction judge function
  • 相关文献

参考文献2

二级参考文献27

  • 1B Babcock, S Babu, M Datar, R Motwani, J Widom. Models and Issues in Data Stream Systems [C]// Proc. of PODS'2002. USA: ACM, 2002: 1-16.
  • 2D Lee, W Lee. Finding maximal frequent itemscts over online data streams adaptively [C]// Proc. of the Fifth IEEE International Conference on Data Mining. Houston. USA: IEEE, 2005: 266-273.
  • 3H Li, S Lee, M Shan. Online mining (recently) maximal frequent itemsets over data streams [C]//Proc. of the fifteenth International Workshops on Research Issues in Data Engineering: Stream Data Mining and Applications, Tokyo, Japan. USA: IEEE, 2005:11-18.
  • 4G Mao, X Wu, X Zhu, et al. Mining maximal frequent itemsets from data streams [J]. Journal of Information Science, 2007, 33(3): 251-262.
  • 5G Grahne, J Zhu. Efficiently Using Prefix-trees in Mining Frequent Itemsets [C]// Proc. of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations. USA: IEEE, 2003.
  • 6Y Yah, Z Li, H Chen. Fast Mining Maximal Frequent ItemSets Based on FP-Tree [C]//Proc. of AI'2004, Cairns Australia, December, 2004. Germany: Springer, 2004: 475-487.
  • 7F Ao, Y Yan, J Huang, K Huang. A Novel Pruning Technique for Mining Maximal Frequent Itemsets [C]// Proc. of FSKD'2007, Haikou, China, August, 2007. USA: IEEE, 2007:469-473.
  • 8Y Zhu, D Shasha. StatStream: Statistical monitoring of thousands of data streams in real time [C]//Proc. of the 28th Int'l Conf. on Very Large Data Bases. Hong Kong: Morgan Kaufmann, 2002: 358-369.
  • 9J Han, J Pei, Y Yin. Mining frequent patterns without candidate generation [C]//Proc. of the Special Interest Group on Management of Data 2000. USA: ACM, 2000: 1-12.
  • 10B Roberto. Efficiently mining long patterns from databases [C]// ACM SIGMOD Conference, 1998. USA: ACM, 1998: 1748-1752.

共引文献49

同被引文献10

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部