基于FP-tree和支持度数组的最大频繁项集挖掘算法被引量：2

Efficient mining maximal frequent itemsets by using FP-tree and support array

下载PDF

导出

摘要提出了一个基于频繁模式树即FP-tree和支持度数组相结合的最大频繁项集挖掘算法,首先建立FP-tree,同时建立支持度数组,然后在此基础上建立最大频繁项集树MAXFP-tree,MAXFP-tree中包含了所有最大频繁项集,缩小了搜索空间,提高了算法的效率。算法分析和实验表明,该算法对稠密型数据集和稀疏型数据集均适用,并且特别适于挖掘具有长频繁项集的数据集。 An efficient algorithm based on FP-tree and support array for mining maximal frequent iternsets is proposed. At first the FP-tree and the support array are created at the same time. Then a maximal frequent itemsets tree- MAXFP-tree is built up to store all the maximal frequent itemsets. Therefore, it can reduce the search space and improve the efficiency of the algorithm. The results of experiment show the algorithm can be applied to both dense datasets and sparse datasets and it is especially effective for mining the datasets with long frequent itemsets.

作者陈慧萍王建东叶飞跃王煜

机构地区南京航空航天大学信息科学与技术学院河海大学计算机信息工程学院

出处《系统工程与电子技术》 EI CSCD 北大核心 2005年第9期1631-1635,共5页 Systems Engineering and Electronics

基金国家973计划基础研究发展基金(G1999032701) 江苏省自然科学基金(BK2002091)资助课题

关键词数据挖掘 FP-TREE MAXFP-tree 支持度数组最大频繁项集 data mining FP-tree MAXFP-tree support array maximal frequent itemsets

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1Agrawal R, Imielinski T, Swami A N. Mining association rules between sets of items in large databases[A]. In P. Buneman and S.Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data[C]. SIGMOD Record,ACMPress, 1993, 22(2): 207- 216.
2叶飞跃,王建东,庄毅,吕宗磊.一种挖掘频繁模式的数据库划分新方法[J].系统工程与电子技术,2004,26(11):1666-1668. 被引量：3
3Hah J, Pei J, Yin Y. Mining frequent patterns without candidate generation: a frequent-pattern tree approach mining frequent patterns without candidate generation[J]. Data Mining and Knowledge Discovery ,2004,8, 53 - 87.
4Bayardo R J. Efficiently mining long patterns from databases[A].SIGMOD' 98 [C]. Seattle, Washington, 1998,85 - 93.
5Gouda K, Zaki M J. Efficiently mining maximal frequent itemsets[A]. 1st IEEE International Conference on Data Mining (ICDM)[C]. San Jose, 2001,163-170.
6Han Jiawei, Micheline Kamber . Data mining: concepts and techniques[M]. morgan Kaufmann , 2001.
7朱玉全,孙志挥,季小俊.基于频繁模式树的关联规则增量式更新算法[J].计算机学报,2003,26(1):91-96. 被引量：80
8UCI Machine Learning Repository. http:∥www. ics. uci. edu/～mlearn /MLSummary. html.
9.[EB/OL].http:∥www. almaden. ibm. com/software/quest/Resources/index. shtml,.

二级参考文献13

1[1]Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Date, Washington DC, 1993.207～216
2[2]Agrawal R, Srikant R. Fast algorithm for mining association rules. In: Proceedings of the 20th International Conference on VLDB, Santiago, Chile, 1994. 487～499
3[3]Han J, Kamber M. Data Mining: Concepts and Techniques. Beijing: Higher Education Press, 2001
4[5]Agrawal R, Shafer J C. Parallel mining of association rules:Design, implementation, and experience. IBM Research Report RJ 10004,1996
5[6]Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules. In: Proceedings of the 21th International Conference on VLDB, Zurich, Switzerland, 1995. 432～444
6[7]Hah J, Jian P et al. Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Dallas, TX, 2000.1～12
7[8]Cheung D W, Lee S D, Kao B. A general incremental technique for maintaining discovered association rules. In: Proceedings of databases systems for advanced applications, Melbourne, Australia, 1997. 185～194
8[10]Han J, Jian P. Mining access patterns efficiently from web logs. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'00), Kyoto, Japan,2000. 396～407
9[11]Agrawal R, Srikant R. Mining sequential pattern. In: Proceedings of the 11th International Conference on Data Engineering, Taipei, 1995. 3～14
10Agrawal R,Srikant R. Fast algorithms for mining association rules[C]. VLDB, 1994. 487-499.

共引文献81

1徐龙,杨君锐.基于数据库变化的关联规则增量式更新算法[J].重庆科技学院学报（自然科学版）,2007,9(4):67-70. 被引量：1
2易彤,徐宝文,吴方君.一种基于FP树的挖掘关联规则的增量更新算法[J].计算机学报,2004,27(5):703-710. 被引量：32
3邓小妮,罗雪山.一种基于事务时间分割的关联规则增量式更新方法[J].计算机工程与应用,2004,40(23):176-179. 被引量：1
4朱玉全,宋余庆,陈耿.约束最大频繁项目集的增量式更新算法[J].计算机工程,2004,30(18):31-32.
5杨君锐.频繁项目集二次挖掘方法研究[J].系统工程与电子技术,2004,26(11):1701-1704.
6李清峰,杨路明,张晓峰.关联规则中最大频繁项目集的研究[J].计算机应用研究,2005,22(1):93-95. 被引量：3
7缪红保,李卫.基于数据挖掘的用户安全行为分析[J].计算机应用研究,2005,22(2):105-107. 被引量：11
8郭伟,唐晓君,刘万军.一种基于划分的聚类算法分析与改进[J].辽宁工程技术大学学报（自然科学版）,2004,23(6):826-828. 被引量：4
9李华君,周海岩.基于项目集知识库的关联规则挖掘与更新的高效算法[J].计算机工程与设计,2004,25(12):2198-2201. 被引量：4
10宋雨,赵建利,王保义.关联规则挖掘中最大频繁集的双向查找算法[J].华北电力大学学报（自然科学版）,2005,32(2):67-70. 被引量：5

同被引文献22

1叶飞跃.基于自适应哈希链的分布式频繁模式挖掘算法[J].系统工程与电子技术,2005,27(3):560-564. 被引量：2
2徐维祥,苏晓军.基于频繁模式树的一种关联规则挖掘算法及其在铁路隧道安全管理中的应用[J].中国安全科学学报,2007,17(3):25-32. 被引量：9
3Dong G, Pei J. Sequence data mining[M]. NewYork : Springer, 2007.
4Han J, Cheng H, Xin D, et al. Frequent pattern mining: current status and future directions[J]. Data Mining and Knowledge Discovery, 2007, 15(1): 55- 86.
5Agrawal R, Srikant R. Mining sequential patterns[C]//Proc. of the llth International Conference on Data Engineering, 1995: 3-14.
6Pei J, Han J, Mortazavi-Asl B, et al. Mining sequential patterns by pattern growth : the PrefixSpan approach [J]. IEEE Trans. on Knowledge and Data Engineering, 2004, 16(11):1424 - 1440.
7Zaki M J. SPADE: an efficient algorithm for mining frequent se quences[J]. Machine Learning, 2001, 42 (1/2) : 31 - 60.
8Yah X, Han J, Afshar R. CloSpan: mining closed sequential patterns in large databases[C]//Proc, of the 3rd SIAM International Conference on Data Mining, 2003 : 166 - 177.
9Wang J, Han J, Li C. Frequent closed sequence mining without candidate maintenance[J]. IEEE Trans. on Knowledge and Data Engineering, 2007, 19(8) :1042-1056.
10Yang G. Computational aspects of mining maximal frequent patterns[J]. Theoretical Computer Science, 2006, 362 (1 - 3) : 63 - 85.

引证文献2

1李晋宏,杨炳儒,宋威,侯伟.基于包含索引的频繁闭序列模式挖掘的新算法[J].系统工程与电子技术,2009,31(10):2485-2488. 被引量：1
2郑晓艳,孙济洲.稀疏数据源频繁模式挖掘并行算法[J].天津大学学报,2011,44(4):353-358.

二级引证文献1

1宋威,刘文博,李晋宏.基于动态裁剪频繁模式树的频繁项集并发挖掘算法[J].山东大学学报（工学版）,2011,41(4):49-55. 被引量：3

1陈慧萍,王建东,叶飞跃.MAXFP-Miner:利用FP-tree快速挖掘最大频繁项集[J].控制与决策,2005,20(8):887-891. 被引量：4
2毛国君,宗东军.基于多维数据流挖掘技术的入侵检测模型与算法[J].计算机研究与发展,2009,46(4):602-609. 被引量：25
3毛国君,孙晓希,宗东军.多维数据流最大频集挖掘模型和算法[J].北京工业大学学报,2010,36(6):820-827.

系统工程与电子技术

2005年第9期

浏览历史

内容加载中请稍等...

基于FP-tree和支持度数组的最大频繁项集挖掘算法被引量：2

参考文献9

二级参考文献13

共引文献81

同被引文献22

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于FP-tree和支持度数组的最大频繁项集挖掘算法 被引量：2

参考文献9

二级参考文献13

共引文献81

同被引文献22

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于FP-tree和支持度数组的最大频繁项集挖掘算法被引量：2