期刊文献+

基于索引数组和复合频繁模式树的频繁闭项集挖掘算法 被引量:1

Closed Itemset Mining Algorithm Based on Index Array and Compound Frequent Itemset Tree
下载PDF
导出
摘要 频繁闭项集惟一确定频繁项集且规模小得多。CROP是一种基于复合频繁模式树的、频繁闭项集高效挖掘算法,但存在着候选结点过多的问题。这些非闭合结点的生成、检查和剪裁带来了大量不必要的操作。提出了一种改进的频繁闭项集挖掘算法CROP_Index。该算法用"索引数组"来组织数据,找到频繁共同出现的项集。基于二进制位图,给出了一个包含索引的计算方法,并利用索引启发信息合并,得到复合型频繁模式树的初始结点;同时给出一些新的性质,使得改进的算法只生成闭合结点,从而节省了大量不必要的操作,缩小了搜索空间。实验结果表明该算法效率较高。 The set of frequent closed itemsets determines exactly the complete set of all frequent itemsets and is usually much smaller than the latter. Based on compound frequent itemset tree (CFIST), CROP is an efficient algorithm for mining frequent closed itemset. However, there are a lot of redundant candidate nodes of CFIST. Correspondingly, the operations, composed of the generation, closure checking, and pruning of those non-closed nodes, lead to high computational cost. In this paper, CROP Index, which is an improved algorithm for mining frequent closed itemset, is proposed. Firstly, the "index array" is proposed, which is used for discovering those itemsets that always appear together. Then, based on bitmap, an algorithm for computing index array is presented. Furthermore, frequent items are merged to initial nodes of CFIST according to heuristic information provided by index array. Finally, some new properties, which can avoid the generation of redundant nodes, are proposed. Thus the improved algorithm only generates closed nodes. Correspondingly, the unnecessary operations are avoided, and the search space is reduced to certain extents. The experimental results show that the proposed algorithm is efficient.
出处 《计算机科学》 CSCD 北大核心 2007年第8期165-167,189,共4页 Computer Science
基金 国家科技成果重点推广项目计划(2003EC000001)资助
关键词 数据挖掘 关联规则 频繁闭项集 索引数组 复合频繁模式树 Data mining, Association rule, Frequent closed itemset, Index array, Compound frequent itemset tree
  • 相关文献

参考文献8

  • 1Pasquier N,Bastide Y,Taouil R,Lakhal L.Discovering Frequent Closed Itemsets for Association Rules[C].In:Beeri C,Buneman P,eds.Proceedings of 7th International Conference Database Theory.Jerusalem:Springer-Verlag,1999.398-416
  • 2Zaki MJ,Hsiao CJ.CHARM:An Efficient Algorithm for Closed Itemset Mining[C].In:Grossman R,et al,eds.Proceedings of the 2nd SIAM International.Conference on Data Mining.Arlington:SIAM,2002.12-28
  • 3Lucchese C,Orlando S,Perego R.Fast and Memory Efficient Mining of Frequent Closed Itemsets[J].IEEE Transaction on Knowledge and Data Engineering,2006,18(1):21-36
  • 4Pei J,Han J,Mao R.CLOSET:An Efficient Algorithm for Mining Frequent Closed Itemsets[C].In:Gunopulos D,Rastogi R,eds.Proceedings of 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.Dallas:ACM Press,2000.21-30
  • 5Wang J Y,Han J,Pei J.CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets[C].In:Getoor L,Senator T E,Domingos P,Faloutsos C,eds.Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2003.236-245
  • 6刘君强,孙晓莹,庄越挺,潘云鹤.挖掘闭合模式的高性能算法[J].软件学报,2004,15(1):94-102. 被引量:19
  • 7缪裕青.频繁闭合项目集的并行挖掘算法研究[J].计算机科学,2004,31(5):166-168. 被引量:5
  • 8Frequent Itemset Mining Implementations Repository.http://fimi.cs.helsinki.fi/

二级参考文献16

  • 1[1]Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Beeri C, et al, eds. Proc. of the 7th Int'l. Conf. on Database Theory. Jerusalem: Springer-Verlag, 1999. 398~416.
  • 2[2]Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Beeri C, et al, eds. Proc. of the 20th Int'l. Conf. on Very Large Databases. Santiago: Morgan Kaufmann Publishers, 1994. 487~499.
  • 3[3]Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, et al, eds. Proc. of the 2000 ACM SIGMOD Int'l. Workshop on Data Mining and Knowledge Discovery. Dallas: ACM Press, 2000. 21~30.
  • 4[4]Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Georgakopoulos D, et al, eds. Proc. of the 17th Int'l. Conf. on Data Engineering. Heidelberg: IEEE Press, 2001. 443~452.
  • 5[5]Zaki MJ, Hsiao CJ. CHARM: An efficient algorithm for closed itemset mining. In: Grossman R, et al, eds. Proc. of the 2nd SIAM Int'l. Conf. on Data Mining. Arlington: SIAM, 2002. 12~28.
  • 6[6]Liu JQ, Pan YH, Wang K, Han J. Mining frequent item sets by opportunistic projection. In: Hand D, et al, eds. Proc. of the 8th ACM SIGKDD Int'l. Conf. on Knowledge Discovery and Data Mining. Alberta: ACM Press, 2002. 229~238.
  • 7[7]Srikant R. Quest synthetic data generation code. San Jose: IBM Almaden Research Center, 1994. http://www.almaden.ibm.com/ software/quest/Resources/index.shtml
  • 8[8]Blake C, Merz C. UCI Repository of machine learning. Irvine: University of California, Department of Information and Computer Science, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html
  • 9[1]Agrawal R, et al. Mining association rules between sets of items in large databases. In: Proc. 1993 ACM SIGMOD Int'l Conf. Management of Data,Washington,D. C. ,May 1993
  • 10[2]Agrawal R,et al. Fast algorithms for mining association rules. In:Proc. 1994 VLDB Int'l Conf. ,Santiago,Chile,Sept. 1994

共引文献22

同被引文献29

  • 1侯东风,杨强,邓苏.一种基于多时间粒度的数据流建模方法[J].计算机工程与科学,2006,28(2):111-114. 被引量:2
  • 2潘云鹤,王金龙,徐从富.数据流频繁模式挖掘研究进展[J].自动化学报,2006,32(4):594-602. 被引量:34
  • 3战立强,刘大昕.基于概念格的频繁闭项集增量挖掘算法研究[J].哈尔滨工程大学学报,2007,28(2):194-197. 被引量:2
  • 4宋旭东,翟坤,刘晓冰.基于图论的频繁闭项集挖掘[J].微电子学与计算机,2007,24(8):28-30. 被引量:1
  • 5Han J W,et al.Frequent pattern mining:Current status and future directions[J].Data Mining and Knowledge Discovery,2007,15(1):55-86.
  • 6Cheng J,et al.A survey on algorithms for mining frequent itemsets over data streams[J].Knowledge and Information Systems,2008,16(1):1-27.
  • 7Agrawal R,et al.Fast algorithms for minging association rules[A].Proceedings of the 20th International Conference on Very Large Data Bases[C].San Francisco:Morgan Kaufmann Publisher,1994.
  • 8Han J,et al.Mining frequent patterns without candidate generation[A].Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data[C].Dallas:ACM,2000.
  • 9Pasquier N,et al.Discovering frequent closed itemsets for association rules[A].In Proceeding of the 7th International Conference on Database Theory[C].Israel:Springer,1999.
  • 10Pei J,et al.CLOSET:An efficient algorithm for mining frequent dosed item sets[A].Proceedings of ACMSIGMOD Workshop On Research Issues in Data Mining and Knowledge Discovery[C].Dallas:ACM,2001.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部