基于图的最大频繁项集的生成算法被引量：2

Maximal frequent itemset feneration based on graph

下载PDF

导出

摘要挖掘频繁项集是数据挖掘的重要技术之一,目前已有很多经典算法,如:apriori算法,FP-tree等.挖掘频繁项集主要是寻找最大频繁项集,为了快速寻找最大频繁项集,通常采用削减候选项集、减少扫描数据库次数的方法和将自底向上与自顶向下的搜索方法结合起来(又称双向搜索).双向搜索能有效地缩减搜索空间.本文把基于图的关联规则挖掘和双向搜索的思想结合起来产生最大频繁项集,提出了基于图的最大频繁项集生成算法.此算法用图将数据映射到一个向量上,通过一遍扫描数据库就可以构造整个频繁项集,结合双向搜索,能快速生成频繁项集,对产生较大长度的最大频繁项集也有较好的效果.文末,把基于图的关联规则挖掘算法和基于图的最大频繁项集算法进行了比较,分析出性能差别的原因. Mining frequent itemsets is a basic and essential task in many data mining applications such as association rules mining and long patterns discovery. Many classic algorithms have been introduced to find the frequent itemsets in database, such as aprior and FP-Tree. Maximal frequent itemsets generation plays an important role in the frequent itemsets mining, because all the frequent itemsets are the subset of the maximal frequent itemsets. Researchers focused on developing efficient algorithm to find frequent itemsets on the following three categories： reducing the number of candidate number, database scan and combining top-down and bottom up search. Graph-based association rules mining is an excellent method to find the maximal frequent itemsets so as to reduce the number of candidate and the number of database scan. The paradigm maps the data in database to bit vector and construct the entire itemsets information by one database scan. The support of itemsets can be calculated by the logic opreration among bit vetors. Some researchers concentrated on the uplife the performance in graph-based frequent itemsets generation by the basic property of relation graph. Relation graph is constructed by the 2-frequent itemsets in which the vertex presents the specific item, and the edge exsits between two vertexs if the two specific corresponding items are the 2- frequent itemset. Once one itemset is k-frequent itemsets, the subgraph of the vertexs presenting the items in the itemset must be the maximal complete subgraph of the relation graph. That is the way to find the maximal frequent itemsets by using the maximal complete subgraph in the relation graph. To reduce the number of the candidate in the context of forming the k＋ 1 frequent candidate itemsets from the k-frequent itemsets, the next ordering vertex was added to the tail of the k frequent itemsets on the condition that the new add vertex must have edge with the k items in k-frequent itemsets. The coding method of items was also proposed, in which the item with bigger degee has the smaller ordering code. Besides, some change the undirected graph to directed graph. Bottom-up and top-down search named by Pincer-Search is a search stradgy to cut off the search space. The bottom-up generated non-frequent itemsets can be used to split the top-down maximal frequent itemsets generation, and the top-down generated frequent itmesets can reduce the number of the bottom-up frequent itemsets. The idea of combining the association rule mining based on graph with Pincer-Search to generate maximal frequent itemsets is first introduced in the article, and the algorithm based on the idea is also presented. The bottom up generated 2-non frequent itmesets splits the top-down frequent itemsets is the most costing task, because the problem that the all maximal complete subgrah is got by the 2-non frequent itemsets is NPC problem. The time of generating all candidates is postponed to avoid costing lots of time to generate the candidate maximal frequent candidate itemsets which may not be the real maximal frequent itemsets. Finally, we compare the new algorithm with primitive graph-based association rules mining.

作者刘红星王崇骏谢俊元

机构地区计算机软件新技术国家重点实验室

出处《南京大学学报（自然科学版）》 CAS CSCD 北大核心 2008年第5期520-526,共7页 Journal of Nanjing University（Natural Science）

基金国家自然科学基金(60503021,60721002,60875038) 江苏省高新技术计划(BG2007038,BG2006027)

关键词关联规则最大频繁项集图 association rules, maximal frequent itemsets, graph

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献15

1Agrawal R, Srikant R. Fast algorithms for mining association rules. Proceedings of the 1994 International Conference on Very large Data Bases. Santiago, 1994, 487-499.
2贾彩燕倪现君.关联规则挖掘研究述评[J].计算机科学,2003,30(4):145-148.
3Han J, Pci J, Yin Y. Mining frequent patterns without candidate generation. Proceedings of the 2000 ACM-SIGMOD International Management of Data, Dalas, 2000,1 - 12.
4Yen S J, Chen A L P. A graph based approach for discovering various type of association rules. IEEE Transactions on Knowledge and Data Engineering, 2001, 13(5): 839-845.
5Lin D, Kedem Z M. Pincer-search: A new algorithm for discovering the maximum frequent set. Proceedings of the 6^th International Conference on Extending Database Tech-nology, 1998,105-119.
6钱进.最大频繁项目集挖掘技术研究与展望[J].微计算机应用,2005,26(6):652-654. 被引量：7
7陈慧萍,王建东,叶飞跃.MAXFP-Miner:利用FP-tree快速挖掘最大频繁项集[J].控制与决策,2005,20(8):887-891. 被引量：4
8冯洁,陶宏才.快速挖掘最大频繁项集[J].微电子学与计算机,2007,24(5):123-126. 被引量：12
9胡斌,蒋外文,蔡国民,黄天强,卓月明.基于位阵的更新最大频繁项集算法[J].计算机工程,2007,33(3):59-61. 被引量：4
10陈慧萍,王建东,王煜.频繁项集挖掘的研究与进展[J].计算机仿真,2006,23(4):68-73. 被引量：10

二级参考文献74

1朱玉全,宋余庆,陈耿.关联规则挖掘中增量式更新算法的研究[J].计算机工程与应用,2005,41(15):186-187. 被引量：8
2陈凯,冯全源.最大频繁项集的高效挖掘[J].微电子学与计算机,2005,22(8):22-25. 被引量：13
3贾彩燕倪现君.关联规则挖掘研究述评[J].计算机科学,2003,30(4):145-148.
4J S Park,M S Chen,P S Yu.An Effective Hash-Based Algorithm for Mining Association Rules[A].Proc 1995 ACM-SIGMOD Int'l Conf Management of Data[C].1995.175-186.
5A Savasere,E Omiecinski,S Navathe.An Efficient Algorithm for Mining Association Rules in Large Database[A].Proc 1995 Int'l Conf Very Large Data[C].1995.432-443.
6S Brin,R Motwani,J D Ullman,et al.Dynamic Itemset Counting and Implication Rules for Market Basket Analysis[A].Proc 1997 ACM-SIGMOD Int'l Conf Management of Data[C].1997.225-264.
7Fenando Berzal,Juan-Carlos Cubero,Nicolas Marin, et al. TBAR:An Efficient Method for Association Rule Mining in Relational Databases[J].IEEE Trans on Data and Knowledge Engineering,2001,13(1):47-64.
8Show-Jane Yen, Arbee L P Chen. A Graph-Based Approach for Discovering Various Types of Association Rules[J].IEEE Trans on Knowledge and Data Engineering,2001,13(5):839-845.
9R Agrawal,R Srikant.Fast Algorithms for Mining Association Rules in Large Databases[R].Research Report RJ 9838,IBM Almaden Reserch Center,1994.
10R. Agrawal and R. Srikant. Fast algorithms for mining association rules in Large Database. In Proceedings of the 20^th International Conference on Very Large Data Base, Santiago deChile, Chile, 1994, 487-499.

共引文献59

1郭云峰,张集祥.对关联规则挖掘中Apriori算法的一种改进[J].杭州电子科技大学学报（自然科学版）,2009,29(2):60-63. 被引量：4
2董彩云,杜韬,郭春燕,曲守宁.聚类后的关联规则快速更新算法研究[J].计算机应用研究,2004,21(11):30-32. 被引量：3
3胡蓉.一种基于串与运算的关联规则挖掘算法[J].湘潭师范学院学报（自然科学版）,2005,27(1):23-24.
4曲守宁,董彩云,徐德军,吴桐.关联规则算法研究及其在教学系统中的应用[J].计算机系统应用,2005,14(4):20-23. 被引量：5
5胡蓉,陈文.一种基于串与运算的关联规则挖掘算法[J].东北电力学院学报,2005,25(2):12-15.
6贾彩燕,陆汝钤.关联规则挖掘的取样误差量化模型和快速估计算法[J].计算机学报,2006,29(4):625-634. 被引量：7
7刘德喜,何炎祥,邢显黎.基于下钻操作的多层关联规则挖掘算法研究[J].三峡大学学报（自然科学版）,2006,28(2):169-173.
8陈明,史忠植,王文杰.一种有效的基于图的关联规则挖掘算法[J].计算机应用,2006,26(11):2654-2656. 被引量：10
9刘德喜,邢显黎,孙南海.关联规则的上探研究[J].襄樊学院学报,2006,27(5):54-58.
10曾舸,刘先锋.关联规则挖掘中Apriori改进算法的研究[J].计算机与现代化,2007(1):46-48. 被引量：3

同被引文献19

1颜跃进,李舟军,陈火旺.基于FP-Tree有效挖掘最大频繁项集[J].软件学报,2005,16(2):215-222. 被引量：68
2宋余庆,朱玉全,孙志挥,杨鹤标.一种基于频繁模式树的约束最大频繁项目集挖掘及其更新算法[J].计算机研究与发展,2005,42(5):777-783. 被引量：21
3陈明,史忠植,王文杰.一种有效的基于图的关联规则挖掘算法[J].计算机应用,2006,26(11):2654-2656. 被引量：10
4Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules[C]//Proc. of VLDB'94. Santiago, Chile: Is. n.], 1994: 487- 499.
5Han Jiawei, Pei Jian, Yin Yiwen. Mining Frequent Patterns Without Candidate Generation[C]//Proc. of SIGMOD'00. Dallas, USA: [s. n.], 2000.
6Sunil J, Jain R C. A Dynamic Approach for Frequent Pattern Mining Using Transposition of Database[C]//Proc. of the 2nd International Conference on Communication Software and Networks. [S. 1.]: IEEE Press, 2010.
7Yen S J, Chen L E A Graph-based Approach for Discovering Various Types of Association Rules[J]. IEEE Transactions on Knowledge and Data Engineering, 2001, 13(5): 839-845.
8刘华婷,郭仁祥,姜浩.关联规则挖掘Apriori算法的研究与改进[J].计算机应用与软件,2009,26(1):146-149. 被引量：119
9张忠平,李岩,杨静.基于矩阵的频繁项集挖掘算法[J].计算机工程,2009,35(1):84-86. 被引量：19
10黄建明,赵文静,王星星.基于十字链表的Apriori改进算法[J].计算机工程,2009,35(2):37-38. 被引量：25

引证文献2

1刘芳.基于图和双向搜索的频繁项集挖掘算法[J].计算机工程,2012,38(1):59-61. 被引量：2
2李宝林,周坤,李仕伟.一种基于M-Bisearch的最大频繁项集挖掘算法研究[J].成都信息工程大学学报,2016,31(5):463-468.

二级引证文献2

1杨永峰,王东煜,胡莹瑾.将数据库业务作为服务的XML数据流正负关联规则挖掘[J].制造业自动化,2012,34(10):109-112.
2吴春旭,贾银山,于红绯.一种Apriori算法的高效实现方法及其应用[J].辽宁石油化工大学学报,2023,43(2):78-85.

1张传武.两种新型大长度传感器[J].新技术新工艺,1991(6):23-25.
2付敏,戴祖旭,王道蓬.压缩编码的上下文树构造算法[J].武汉工程大学学报,2015,37(4):56-58 64. 被引量：1
3李薇.国家光电测距仪检测中心大长度实验室检测平台自动化项目成绩突出[J].中国测绘,2008(6):83-83. 被引量：1
4郗荣才,钟昭刚,谢艳燕,赵兴龙,孔玉峰.大型工程数据库中图形图像信息的管理与应用[J].重庆石油高等专科学校学报,2003,5(4):61-63.
5柳常清,宋庆,孙占文,吴韩飞.双PSD实现长直导轨四自由度测量的新方法[J].光学仪器,2013,35(6):26-30. 被引量：4
6巫喜红.基于后缀搜索的单模式匹配算法研究[J].计算机技术与发展,2012,22(12):127-130.
7刘冰,刘雪梅.基于Lorenz三维系统的伪随机二值序列生成方法[J].四川文理学院学报,2017,27(2):33-36.
8李红泉.新型航空拖缆制作与检测方法[J].兵工自动化,2014,33(5):19-21. 被引量：4
9肖海,章亚男,沈林勇,钱晋武.光纤光栅曲线重建算法中的曲率连续化研究[J].仪器仪表学报,2016,37(5):993-999. 被引量：15
10GAO Ang,HU YanSu,WANG ZhiJun,MU DeJun,LI JunJie,WANG JinCheng.GPU-accelerated phase field simulation of directional solidification[J].Science China(Technological Sciences),2014,57(6):1191-1197. 被引量：1

南京大学学报（自然科学版）

2008年第5期

浏览历史

内容加载中请稍等...

基于图的最大频繁项集的生成算法被引量：2

参考文献15

二级参考文献74

共引文献59

同被引文献19

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于图的最大频繁项集的生成算法 被引量：2

参考文献15

二级参考文献74

共引文献59

同被引文献19

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于图的最大频繁项集的生成算法被引量：2