期刊文献+

基于图的最大频繁项集的生成算法 被引量:2

Maximal frequent itemset feneration based on graph
下载PDF
导出
摘要 挖掘频繁项集是数据挖掘的重要技术之一,目前已有很多经典算法,如:apriori算法,FP-tree等.挖掘频繁项集主要是寻找最大频繁项集,为了快速寻找最大频繁项集,通常采用削减候选项集、减少扫描数据库次数的方法和将自底向上与自顶向下的搜索方法结合起来(又称双向搜索).双向搜索能有效地缩减搜索空间.本文把基于图的关联规则挖掘和双向搜索的思想结合起来产生最大频繁项集,提出了基于图的最大频繁项集生成算法.此算法用图将数据映射到一个向量上,通过一遍扫描数据库就可以构造整个频繁项集,结合双向搜索,能快速生成频繁项集,对产生较大长度的最大频繁项集也有较好的效果.文末,把基于图的关联规则挖掘算法和基于图的最大频繁项集算法进行了比较,分析出性能差别的原因. Mining frequent itemsets is a basic and essential task in many data mining applications such as association rules mining and long patterns discovery. Many classic algorithms have been introduced to find the frequent itemsets in database, such as aprior and FP-Tree. Maximal frequent itemsets generation plays an important role in the frequent itemsets mining, because all the frequent itemsets are the subset of the maximal frequent itemsets. Researchers focused on developing efficient algorithm to find frequent itemsets on the following three categories: reducing the number of candidate number, database scan and combining top-down and bottom up search. Graph-based association rules mining is an excellent method to find the maximal frequent itemsets so as to reduce the number of candidate and the number of database scan. The paradigm maps the data in database to bit vector and construct the entire itemsets information by one database scan. The support of itemsets can be calculated by the logic opreration among bit vetors. Some researchers concentrated on the uplife the performance in graph-based frequent itemsets generation by the basic property of relation graph. Relation graph is constructed by the 2-frequent itemsets in which the vertex presents the specific item, and the edge exsits between two vertexs if the two specific corresponding items are the 2- frequent itemset. Once one itemset is k-frequent itemsets, the subgraph of the vertexs presenting the items in the itemset must be the maximal complete subgraph of the relation graph. That is the way to find the maximal frequent itemsets by using the maximal complete subgraph in the relation graph. To reduce the number of the candidate in the context of forming the k+ 1 frequent candidate itemsets from the k-frequent itemsets, the next ordering vertex was added to the tail of the k frequent itemsets on the condition that the new add vertex must have edge with the k items in k-frequent itemsets. The coding method of items was also proposed, in which the item with bigger degee has the smaller ordering code. Besides, some change the undirected graph to directed graph. Bottom-up and top-down search named by Pincer-Search is a search stradgy to cut off the search space. The bottom-up generated non-frequent itemsets can be used to split the top-down maximal frequent itemsets generation, and the top-down generated frequent itmesets can reduce the number of the bottom-up frequent itemsets. The idea of combining the association rule mining based on graph with Pincer-Search to generate maximal frequent itemsets is first introduced in the article, and the algorithm based on the idea is also presented. The bottom up generated 2-non frequent itmesets splits the top-down frequent itemsets is the most costing task, because the problem that the all maximal complete subgrah is got by the 2-non frequent itemsets is NPC problem. The time of generating all candidates is postponed to avoid costing lots of time to generate the candidate maximal frequent candidate itemsets which may not be the real maximal frequent itemsets. Finally, we compare the new algorithm with primitive graph-based association rules mining.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2008年第5期520-526,共7页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(60503021,60721002,60875038) 江苏省高新技术计划(BG2007038,BG2006027)
关键词 关联规则 最大频繁项集 association rules, maximal frequent itemsets, graph
  • 相关文献

参考文献15

二级参考文献74

  • 1朱玉全,宋余庆,陈耿.关联规则挖掘中增量式更新算法的研究[J].计算机工程与应用,2005,41(15):186-187. 被引量:8
  • 2陈凯,冯全源.最大频繁项集的高效挖掘[J].微电子学与计算机,2005,22(8):22-25. 被引量:13
  • 3贾彩燕 倪现君.关联规则挖掘研究述评[J].计算机科学,2003,30(4):145-148.
  • 4J S Park,M S Chen,P S Yu.An Effective Hash-Based Algorithm for Mining Association Rules[A].Proc 1995 ACM-SIGMOD Int'l Conf Management of Data[C].1995.175-186.
  • 5A Savasere,E Omiecinski,S Navathe.An Efficient Algorithm for Mining Association Rules in Large Database[A].Proc 1995 Int'l Conf Very Large Data[C].1995.432-443.
  • 6S Brin,R Motwani,J D Ullman,et al.Dynamic Itemset Counting and Implication Rules for Market Basket Analysis[A].Proc 1997 ACM-SIGMOD Int'l Conf Management of Data[C].1997.225-264.
  • 7Fenando Berzal,Juan-Carlos Cubero,Nicolas Marin, et al. TBAR:An Efficient Method for Association Rule Mining in Relational Databases[J].IEEE Trans on Data and Knowledge Engineering,2001,13(1):47-64.
  • 8Show-Jane Yen, Arbee L P Chen. A Graph-Based Approach for Discovering Various Types of Association Rules[J].IEEE Trans on Knowledge and Data Engineering,2001,13(5):839-845.
  • 9R Agrawal,R Srikant.Fast Algorithms for Mining Association Rules in Large Databases[R].Research Report RJ 9838,IBM Almaden Reserch Center,1994.
  • 10R. Agrawal and R. Srikant. Fast algorithms for mining association rules in Large Database. In Proceedings of the 20^th International Conference on Very Large Data Base, Santiago deChile, Chile, 1994, 487-499.

共引文献59

同被引文献19

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部