期刊文献+

数据流中频繁闭项集的近似挖掘算法 被引量:14

An Algorithm to Approximately Mine Frequent Closed Itemsets from Data Streams
下载PDF
导出
摘要 在数据流中挖掘频繁项集得到了广泛的研究,传统的研究方法大多关注于在数据流中挖掘全部频繁项集.由于挖掘全部频繁项集存在数据和模式冗余问题,所以对算法的时间和空间效率都具有更大的挑战性.因此,近年来人们开始关注在数据流中挖掘频繁闭项集,其中一个典型的工作就是Moment算法.本文提出了一种数据流中频繁闭项集的近似挖掘算法A-Moment.它采用衰减窗口机制、近似计数估计方法和分布式更新信息策略来解决Moment算法中过度依赖于窗口和执行效率低等问题.实验表明,该算法在保证挖掘精度的前提下,可以比Moment获得更好的效率. Mining frequent itemsets from data streams has extensively been studied, and most of them focus on finding complete set of frequent itemsets in a data stream. Because of numerous redundant data and patterns in main memory, they cannot get very good performance in time and space. Therefore,mining frequent closed itemsets in data streams becomes a new important problem in recent years, where algorithm Moment was regarded as a typical method of them. This paper presents an algorithm, called AMoment, which uses the damped window technique, approximate count method and distributed updating strategy to get higher mining efficiency. Experimental results show that our algorithm performs much better than the previous approaches.
出处 《电子学报》 EI CAS CSCD 北大核心 2007年第5期900-905,共6页 Acta Electronica Sinica
基金 国家自然科学基金重大项目(No.60496322 60496327)
关键词 数据挖掘 数据流 频繁闭项集 data mining data stream frequent closed itemset
  • 相关文献

参考文献13

  • 1Agrawal R,Srikant R.Fast algorithms for mining association rules in large databases[A].Proceedings of the 20th International Conference on Very Large Data Bases[C].San Francisco:Morgan Kaufmann,1994.487-499.
  • 2Han J,Pei J,Yin Y.Mining frequent patterns without candidate generation[A].2000 ACM SIGMOD International Conference on Management of Data[C].Dallas:ACM Press,2000.1-12.
  • 3Pasquier N,Bastide Y,Taouil R,Lakhal L.Discovering frequent closed itemsets for association rules[A].In Proceeding of the 7th International Conference on Database Theory[C].Jerusalem,Israel:Springer,1999.398-416.
  • 4Pei J,Han J,Mao R.CLOSET:an efficient algorithm for mining frequent closed itemsets[A].ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery[C].Dallas:ACM Press,2000.21-30.
  • 5Burdick D,Calimlim M,Gehrke J.MAFIA:a maximal frequent itemset algorithm for transactional databases[A].Proceedings of the 17th International Conference on Data Engineering[C].Heidelberg:IEEE Computer Society Press,2001.443-452.
  • 6Zaki M,Hsiao C.Charm:an efficient algorithm for closed association rule mining[R].New York:RPI,1999.
  • 7Zhu Y,Shasha D.StatStream:statistical monitoring of thousands of data streams in real time[A].Proceedings of the 20th International Conference on Very Large Data Bases[C].Hong Kong,China:Morgan Kaufmann,2002.358-369.
  • 8Giannella C,Han J,Robertson E,Liu C.Mining frequent itemsets over arbitrary time intervals in data streams[R].Bloomington:Indiana University,2003.
  • 9Manku G,Motwani R.Approximate frequency counts over data streams[A].Proceedings of the 28th International Conference on Very Large Data Bases[C].Hong Kong,China:Morgan Kaufmann,2002.346-357.
  • 10Teng W-G,Chen M-S,Yu P S.A regression-based temporal pattern mining scheme for data streams[A].Proceedings of the 29th International Conference on Very Large Data Bases[C].Berlin,Germany:Morgan Kaufmann,2003.607-617.

二级参考文献8

  • 1[1]Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Beeri C, et al, eds. Proc. of the 7th Int'l. Conf. on Database Theory. Jerusalem: Springer-Verlag, 1999. 398~416.
  • 2[2]Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Beeri C, et al, eds. Proc. of the 20th Int'l. Conf. on Very Large Databases. Santiago: Morgan Kaufmann Publishers, 1994. 487~499.
  • 3[3]Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, et al, eds. Proc. of the 2000 ACM SIGMOD Int'l. Workshop on Data Mining and Knowledge Discovery. Dallas: ACM Press, 2000. 21~30.
  • 4[4]Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Georgakopoulos D, et al, eds. Proc. of the 17th Int'l. Conf. on Data Engineering. Heidelberg: IEEE Press, 2001. 443~452.
  • 5[5]Zaki MJ, Hsiao CJ. CHARM: An efficient algorithm for closed itemset mining. In: Grossman R, et al, eds. Proc. of the 2nd SIAM Int'l. Conf. on Data Mining. Arlington: SIAM, 2002. 12~28.
  • 6[6]Liu JQ, Pan YH, Wang K, Han J. Mining frequent item sets by opportunistic projection. In: Hand D, et al, eds. Proc. of the 8th ACM SIGKDD Int'l. Conf. on Knowledge Discovery and Data Mining. Alberta: ACM Press, 2002. 229~238.
  • 7[7]Srikant R. Quest synthetic data generation code. San Jose: IBM Almaden Research Center, 1994. http://www.almaden.ibm.com/ software/quest/Resources/index.shtml
  • 8[8]Blake C, Merz C. UCI Repository of machine learning. Irvine: University of California, Department of Information and Computer Science, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html

共引文献18

同被引文献122

引证文献14

二级引证文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部