期刊文献+

一种新的频繁项集精简表示方法及其挖掘算法的研究 被引量:18

Research on a New Concise Representation of Frequent Itemset and Its Mining Algorithm
下载PDF
导出
摘要 频繁项集挖掘是数据挖掘研究领域的一个基本问题,其瓶颈在于频繁项集全集的结果过多,冗余现象严重.主要的解决思路是只挖掘全体频繁项集中有代表性的子集,使得这种子集或者可满足应用的需要或者可由它们导出其他项集.最大项集和闭项集便是这类解决方案中两种最典型的子集形式.在最大项集和闭项集的基础上,提出了元项集这一新的频繁项集精简表示方法.首先,证明了最大项集和闭项集都是元项集的特例,且元项集所包含的项集数目介于二者之间;其次,讨论了元项集的性质.最后,通过在闭项集挖掘算法DCI-Closed-Index的基础上引入剪枝策略,设计了一个元项集挖掘算法.实验结果表明,所提出的挖掘算法是有效的和高效的. Frequent itemset mining has become an important data mining task and a focused theme in data mining research. The bottlenecks of frequent itemset mining are as follows: On the one hand, the number of all frequent itemsets is usually extremely large. On the other hand, there is often severe redundancy in the resultant itemsets. To overcome these problems, recently several proposals have been made to construct a concise representation of the frequent itemsets, instead of mining all frequent itemsets. The aim is that the resultant subset can either satisfy the requirements of applications, or can derive all the other frequent itemsets. Maximal itemset and closed itemset are two most typical representative subsets of all frequent itemsets. Based on maximal itemset and closed itemset, a new concise representation of frequent itemset, namely meta itemset, is proposed. It is proved that both maximal itemset and closed itemset are special cases of meta itemset. The cardinality of meta itemset is between those of maximal itemset and closed ire:reset. Then, property of meta itemset is discussed. Finally, by introducing pruning strategies to DCI-closed-index, which is a closed itemset mining algorithm, an algorithm for mining meta itemset is proposed. Experimental results show that the proposed algorithm is effective and efficient.
出处 《计算机研究与发展》 EI CSCD 北大核心 2010年第2期277-285,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60675030) 北京市优秀人才培养资助项目(2009D005002000009) 北方工业大学青年重点研究基金项目 北方工业大学博士科研启动基金项目~~
关键词 数据挖掘 关联规则 最大项集 闭项集 元项集 data mining association rule maximal itemset closed itemset meta itemset
  • 相关文献

参考文献10

  • 1Ceglar A, Roddick J F. Association mining [J]. ACM Computing Surveys, 2006, 38 (2): 1-42.
  • 2Piatetsky-Shapiro G. Data mining and knowledge discovery 1996 to 2005: Overcoming the hype and moving from "university" to "business" and "analytics"[J]. Data Mining Knowledge Discovery, 2007, 15(1): 99-105.
  • 3Han J, Cheng H, Xin D, et al. Frequent pattern mining: Current status and future directions[J]. Data Mining and Knowledge Discovery, 2007, 15(1): 55-86.
  • 4Calders T, Rigotti C, Boulicaut J F. A survey on condensed representations for frequent sets [G] //Constraint-Based Mining and Inductive Databases. Berlin: Springer, 2005:64- 80.
  • 5Song W, Yang B R, Xu Z Y. Index-MaxMiner: A new maximal frequent itemset mining algorithm [J]. International Journal on Artificial Intelligence Tools, 2008, 17(2): 303- 320.
  • 6秦亮曦,史忠植.SFPMax——基于排序FP树的最大频繁模式挖掘算法[J].计算机研究与发展,2005,42(2):217-223. 被引量:26
  • 7Lucchese C, Orlando S, Perego R. Fast and memory efficient mining of frequent closed itemsets [J]. IEEE Trans on Knowledge and Data Engineering, 2006, 18 (1): 21-36.
  • 8宋威,杨炳儒,徐章艳,高静.一种改进的频繁闭项集挖掘算法[J].计算机研究与发展,2008,45(2):278-286. 被引量:11
  • 9Liu G, Lu H, Lou W, et al. Efficient mining of frequent patterns using ascending frequency ordered prefix-tree [J]. Data Mining and Knowledge Discovery, 2004, 9 (3) : 249- 274.
  • 10Grahne G, Zhu J. Fast algorithms for frequent itemset mining using prefix-trees[J]. IEEE Trans on Knowledge and Data Engineering, 2005, 17(10): 1:347-1362.

二级参考文献21

  • 1秦亮曦,史忠植.SFPMax——基于排序FP树的最大频繁模式挖掘算法[J].计算机研究与发展,2005,42(2):217-223. 被引量:26
  • 2宋余庆,朱玉全,孙志挥,杨鹤标.一种基于频繁模式树的约束最大频繁项目集挖掘及其更新算法[J].计算机研究与发展,2005,42(5):777-783. 被引量:21
  • 3.[EB/OL].http://www. ics. uci. edu/~ mlearn/MLRepository. html,1996.
  • 4D. Burdick, M. Calimlim, J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In: D.Georgakopoulos, et al, eds. Proc. of 17th Int'l Conf. on Data Engineering. Heidelberg: IEEE Press, 2001. 443~452.
  • 5K. Gouda, M. Zaki. Efficiently mining maximal frequent itemsets. In: N. Cercone, T. Y. Lin, X. Wu, eds. Proc. of the 2001 IEEE Int'l Conf. on Data Mining. San Jose: IEEE Press,2001. 163~170.
  • 6R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in large database. In: P. Buneman, S.Jajodia, eds. Proc. of 1993 ACM SIGMOD Conf. on Management of Data. Washington DC: ACM Press, 1993. 207~216.
  • 7R. Agrawal, R. Srikant. Fast algorithms for mining association rules. In: J. Bocca, M. Jarke, C. Zaniolo, eds. Proc. of the 20th Int'l Conf. on Very Large Databases (VLDB' 94) .Santiago: Morgan Kaufmann, 1994. 487~499.
  • 8J. Han, J. Pei, Y. Yin. Mining frequent patterns without candidate generation. In: M. Dunham, J. Naughton, W. Chen,eds. Proc. of 2000 ACM-SIGMOD Int'l Conf. on Management of Data (SIGMOD'00). Dallas, TX, New York: ACM Press,2000. 1~ 12.
  • 9N. Pasquier, Y. Bastide, R. Taouil, et al.. Discovering frequent colsed itemsets for association rules. In: C. Beeri, et al,eds. Proc. of 7th Int'l Conf. on Database Theory. Jerusalem:Springer-Verlag, 1999. 398~416.
  • 10J. Pei, J. Han, R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In: D Gunopulos, et al, eds.Proc. of the 2000 ACM SIGMOD Int' l Workshop on Data Mining and Knowledge Discovery. Dallas: ACM Press, 2000. 21~30.

共引文献34

同被引文献251

引证文献18

二级引证文献76

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部