基于索引数组和复合频繁模式树的频繁闭项集挖掘算法被引量：1

Closed Itemset Mining Algorithm Based on Index Array and Compound Frequent Itemset Tree

下载PDF

导出

摘要频繁闭项集惟一确定频繁项集且规模小得多。CROP是一种基于复合频繁模式树的、频繁闭项集高效挖掘算法,但存在着候选结点过多的问题。这些非闭合结点的生成、检查和剪裁带来了大量不必要的操作。提出了一种改进的频繁闭项集挖掘算法CROP_Index。该算法用"索引数组"来组织数据,找到频繁共同出现的项集。基于二进制位图,给出了一个包含索引的计算方法,并利用索引启发信息合并,得到复合型频繁模式树的初始结点;同时给出一些新的性质,使得改进的算法只生成闭合结点,从而节省了大量不必要的操作,缩小了搜索空间。实验结果表明该算法效率较高。 The set of frequent closed itemsets determines exactly the complete set of all frequent itemsets and is usually much smaller than the latter. Based on compound frequent itemset tree （CFIST）, CROP is an efficient algorithm for mining frequent closed itemset. However, there are a lot of redundant candidate nodes of CFIST. Correspondingly, the operations, composed of the generation, closure checking, and pruning of those non-closed nodes, lead to high computational cost. In this paper, CROP Index, which is an improved algorithm for mining frequent closed itemset, is proposed. Firstly, the ＂index array＂ is proposed, which is used for discovering those itemsets that always appear together. Then, based on bitmap, an algorithm for computing index array is presented. Furthermore, frequent items are merged to initial nodes of CFIST according to heuristic information provided by index array. Finally, some new properties, which can avoid the generation of redundant nodes, are proposed. Thus the improved algorithm only generates closed nodes. Correspondingly, the unnecessary operations are avoided, and the search space is reduced to certain extents. The experimental results show that the proposed algorithm is efficient.

作者宋威杨炳儒徐章艳张桃红

机构地区北京科技大学信息工程学院

出处《计算机科学》 CSCD 北大核心 2007年第8期165-167,189,共4页 Computer Science

基金国家科技成果重点推广项目计划(2003EC000001)资助

关键词数据挖掘关联规则频繁闭项集索引数组复合频繁模式树 Data mining, Association rule, Frequent closed itemset, Index array, Compound frequent itemset tree

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Pasquier N,Bastide Y,Taouil R,Lakhal L.Discovering Frequent Closed Itemsets for Association Rules[C].In:Beeri C,Buneman P,eds.Proceedings of 7th International Conference Database Theory.Jerusalem:Springer-Verlag,1999.398-416
2Zaki MJ,Hsiao CJ.CHARM:An Efficient Algorithm for Closed Itemset Mining[C].In:Grossman R,et al,eds.Proceedings of the 2nd SIAM International.Conference on Data Mining.Arlington:SIAM,2002.12-28
3Lucchese C,Orlando S,Perego R.Fast and Memory Efficient Mining of Frequent Closed Itemsets[J].IEEE Transaction on Knowledge and Data Engineering,2006,18(1):21-36
4Pei J,Han J,Mao R.CLOSET:An Efficient Algorithm for Mining Frequent Closed Itemsets[C].In:Gunopulos D,Rastogi R,eds.Proceedings of 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.Dallas:ACM Press,2000.21-30
5Wang J Y,Han J,Pei J.CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets[C].In:Getoor L,Senator T E,Domingos P,Faloutsos C,eds.Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2003.236-245
6刘君强,孙晓莹,庄越挺,潘云鹤.挖掘闭合模式的高性能算法[J].软件学报,2004,15(1):94-102. 被引量：19
7缪裕青.频繁闭合项目集的并行挖掘算法研究[J].计算机科学,2004,31(5):166-168. 被引量：5
8Frequent Itemset Mining Implementations Repository.http://fimi.cs.helsinki.fi/

二级参考文献16

1[1]Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Beeri C, et al, eds. Proc. of the 7th Int'l. Conf. on Database Theory. Jerusalem: Springer-Verlag, 1999. 398～416.
2[2]Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Beeri C, et al, eds. Proc. of the 20th Int'l. Conf. on Very Large Databases. Santiago: Morgan Kaufmann Publishers, 1994. 487～499.
3[3]Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, et al, eds. Proc. of the 2000 ACM SIGMOD Int'l. Workshop on Data Mining and Knowledge Discovery. Dallas: ACM Press, 2000. 21～30.
4[4]Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Georgakopoulos D, et al, eds. Proc. of the 17th Int'l. Conf. on Data Engineering. Heidelberg: IEEE Press, 2001. 443～452.
5[5]Zaki MJ, Hsiao CJ. CHARM: An efficient algorithm for closed itemset mining. In: Grossman R, et al, eds. Proc. of the 2nd SIAM Int'l. Conf. on Data Mining. Arlington: SIAM, 2002. 12～28.
6[6]Liu JQ, Pan YH, Wang K, Han J. Mining frequent item sets by opportunistic projection. In: Hand D, et al, eds. Proc. of the 8th ACM SIGKDD Int'l. Conf. on Knowledge Discovery and Data Mining. Alberta: ACM Press, 2002. 229～238.
7[7]Srikant R. Quest synthetic data generation code. San Jose: IBM Almaden Research Center, 1994. http://www.almaden.ibm.com/ software/quest/Resources/index.shtml
8[8]Blake C, Merz C. UCI Repository of machine learning. Irvine: University of California, Department of Information and Computer Science, 1998. http://www.ics.uci.edu/～mlearn/MLRepository.html
9[1]Agrawal R, et al. Mining association rules between sets of items in large databases. In: Proc. 1993 ACM SIGMOD Int'l Conf. Management of Data,Washington,D. C. ,May 1993
10[2]Agrawal R,et al. Fast algorithms for mining association rules. In:Proc. 1994 VLDB Int'l Conf. ,Santiago,Chile,Sept. 1994

共引文献22

1赵明,王寒凝.基于车牌照识别技术的OD调查系统分析与设计[J].公路交通科技（应用技术版）,2008,4(12):188-190. 被引量：7
2张莹,韩芳溪,柴乔林.基于频繁模式树的AOI聚类算法[J].计算机工程与应用,2004,40(35):178-179.
3刘学军,徐宏炳,董逸生,钱江波,王永利.基于滑动窗口的数据流闭合频繁模式的挖掘[J].计算机研究与发展,2006,43(10):1738-1743. 被引量：26
4杨萍,李立乡,杨明.快速更新频繁闭合项目集算法[J].计算机工程与应用,2006,42(36):148-151. 被引量：1
5刘旭,毛国君,孙岳,刘椿年.数据流中频繁闭项集的近似挖掘算法[J].电子学报,2007,35(5):900-905. 被引量：14
6程转流,胡为成,胡学钢.基于DSFCI-tree的分布式数据流频繁闭合模式挖掘[J].微电子学与计算机,2007,24(9):120-122. 被引量：2
7缪裕青,尹东.分布式存储结构的频繁闭合模式挖掘并行算法[J].微电子学与计算机,2007,24(10):161-163. 被引量：3
8缪裕青,陈国良,徐云.基因表达数据的频繁闭合模式挖掘新算法[J].中国科学技术大学学报,2007,37(9):1080-1087. 被引量：1
9郭宇红,童云海,唐世渭,杨冬青.基于FP-Tree的反向频繁项集挖掘[J].软件学报,2008,19(2):338-350. 被引量：20
10缪裕青,金波,陈国良.HTCLOSE：快速挖掘微阵列数据集中的频繁闭合模式[J].小型微型计算机系统,2008,29(2):274-278.

同被引文献29

1侯东风,杨强,邓苏.一种基于多时间粒度的数据流建模方法[J].计算机工程与科学,2006,28(2):111-114. 被引量：2
2潘云鹤,王金龙,徐从富.数据流频繁模式挖掘研究进展[J].自动化学报,2006,32(4):594-602. 被引量：34
3战立强,刘大昕.基于概念格的频繁闭项集增量挖掘算法研究[J].哈尔滨工程大学学报,2007,28(2):194-197. 被引量：2
4宋旭东,翟坤,刘晓冰.基于图论的频繁闭项集挖掘[J].微电子学与计算机,2007,24(8):28-30. 被引量：1
5Han J W,et al.Frequent pattern mining:Current status and future directions[J].Data Mining and Knowledge Discovery,2007,15(1):55-86.
6Cheng J,et al.A survey on algorithms for mining frequent itemsets over data streams[J].Knowledge and Information Systems,2008,16(1):1-27.
7Agrawal R,et al.Fast algorithms for minging association rules[A].Proceedings of the 20th International Conference on Very Large Data Bases[C].San Francisco:Morgan Kaufmann Publisher,1994.
8Han J,et al.Mining frequent patterns without candidate generation[A].Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data[C].Dallas:ACM,2000.
9Pasquier N,et al.Discovering frequent closed itemsets for association rules[A].In Proceeding of the 7th International Conference on Database Theory[C].Israel:Springer,1999.
10Pei J,et al.CLOSET:An efficient algorithm for mining frequent dosed item sets[A].Proceedings of ACMSIGMOD Workshop On Research Issues in Data Mining and Knowledge Discovery[C].Dallas:ACM,2001.

引证文献1

1潘怡,杜红燕.数据流频繁闭项集挖掘研究[J].长沙大学学报,2010,24(5):64-67.

1张忠平,李岩,林志杰,王爱杰.基于索引数组的频繁项集挖掘算法[J].计算机应用研究,2009,26(1):44-46. 被引量：2
2宋威,杨炳儒,徐章艳,韩彦岭.一种基于索引数组的频繁项集高效挖掘算法[J].高技术通讯,2008,18(3):259-264. 被引量：1
3宋威,杨炳儒,徐章艳,高静.一种改进的频繁闭项集挖掘算法[J].计算机研究与发展,2008,45(2):278-286. 被引量：11
4毛建景.基于索引数组的改进关联规则算法[J].洛阳师范学院学报,2015,34(8):65-67.
5严菲,杨科华.基于索引数组的频繁项集增量更新算法[J].微计算机信息,2010,26(33):150-151. 被引量：1
6董改芳,许道云.游戏中寻找路径的改进算法[J].计算机工程与应用,2009,45(23):38-39. 被引量：3
7宋威,杨炳儒,徐章艳,侯伟.基于索引数组与集合枚举树的最大频繁项集挖掘算法[J].计算机科学,2007,34(7):146-149. 被引量：4
8丛丹,王俊普,杨文,张绍一.一种新的关联规则的高效挖掘算法[J].计算机应用研究,2003,20(11):57-58. 被引量：4
9王国峰,张科,李言俊.基于小波多分辨分析的图像融合方法[J].红外与激光工程,2003,32(5):513-515. 被引量：15
10殷海明,张明敏,潘志庚.一种织物彩色图像的分割算法[J].计算机应用,2005,25(4):966-967. 被引量：5

计算机科学

2007年第8期

浏览历史

内容加载中请稍等...

基于索引数组和复合频繁模式树的频繁闭项集挖掘算法被引量：1

参考文献8

二级参考文献16

共引文献22

同被引文献29

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于索引数组和复合频繁模式树的频繁闭项集挖掘算法 被引量：1

参考文献8

二级参考文献16

共引文献22

同被引文献29

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于索引数组和复合频繁模式树的频繁闭项集挖掘算法被引量：1