期刊文献+

交集剪枝法挖掘最大频繁项集 被引量:1

Algorithm for mining maximum frequent itemset based on intersection pruning
下载PDF
导出
摘要 发现最大频繁项目集是数据挖掘应用中的关键问题;为寻求避免生成大量的候选项集,或生成频繁模式树的挖掘算法,提出一种从事务项集对应的最大频繁项集求全部属性项集的最大频繁项集的新算法IPA(Intersection Pruning Algorithm)。该算法通过交集剪枝实现自顶向下和自底向上的搜索最大频繁项集,并使用属性项的分布数据和已生成的交集等多种信息来减少求交集的次数;该算法最多只用求(1-最小支持度)×|D|+1个事务项集和其他事务项集的交集,从而可有效降低算法的时间复杂度;实验表明该算法有效可行,并且该算法易于实现。 Discovering maximal frequent itemset is a key issue in data mining;to look for an algorithm that can avoid the generating of vast volume of candidate itemsets,or the generating of frequent pattern tree,an intersection pruning algorithm(IPA) is proposed to find the maximum frequent sets for itemset of all properties from the maximum frequent itemset for transaction itemset.h combines a top-down and bottom-up searches for maximum frequent itemset through intersection pruning,and uses the distribution data of properties and information of the generated intersections,etc, to reduce the number of intersects.Up to (1-minimum support)x|D|+l intersections are calculated,so the time complexity of this algorithm is relatively low;experiments show that this algorithm is valid and efficient,and it is also easy in coding for use in KDD applications.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第13期156-159,共4页 Computer Engineering and Applications
基金 辽宁信息科学与工程重点实验室开放课题No.2005003 2008年大连市IT优秀教师科研基金~~
关键词 数据挖掘 最大频繁项集 候选项集 交集 剪枝 data mining maximum frequent itemsets candidate itemsets intersection pruning
  • 相关文献

参考文献9

  • 1Ceglar A,Roddick J F.Association mining[J].A CM Computing Surveys,2006,38(2):1-42.
  • 2Rigoutaos L,Floratos A.Combinatorial pattern discovery in bio-logical sequences:The teiresias algorithm[J].Bioinformatics,1998,14 (1):55-67.
  • 3Bayardo R J.Efficiently mining long patterns from databases[C]//Haas L M,Tiwary A.Proceedings ACM SIGMOD International Conference on Management of Data,1998:85-93.
  • 4Lin D I,Kedem Z M.Pincer-aearch:A now algorithm for discovering the maximum frequent set[C]//Schek H J.Proceedings of 6th International Conference on Extending Database Technology,1998:105-119.
  • 5Agarwal R C,Aggarwal C C,Prasad V V V.Depth first generation of long patterns[C]//Ramakrishnan R,Stolfo S.Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000:108-118.
  • 6Burdiek D,Calimlim M,Gehrke J.MAFIA:A maximal frequent itemset algorithm for transactional databases[C]//Georgakopoulos D.Proceedings of the 17th International Conference on Data Engineering,2001-443-452.
  • 7Gouda K,Zaki M J.Efficiently mining maximal frequent itemsets[C]//Cercone N,Lin T Y,Wu X D.Proceedings of the 2001 IEEE International Conference on Data Mining,2001:163-170.
  • 8宋余庆,朱玉全,孙志挥,陈耿.基于FP-Tree的最大频繁项目集挖掘及更新算法[J].软件学报,2003,14(9):1586-1592. 被引量:164
  • 9李庆华,王卉,蒋盛益.挖掘最大频繁项集的并行算法[J].计算机科学,2004,31(12):132-134. 被引量:5

二级参考文献7

  • 1Burdick D, Calimlim M, Gehrke J. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In: Proc.of 17th Intl. Conf. on Data Engineering, Heidelberg, Germany,April 2001. 443-452
  • 2Gouda K, Zaki M J. Efficiently Mining Maximal Frequent Itemsets. In: Proc. of 2001 IEEE Intl. Conf. on Data Mining (ICDM'01), San Jose, California, November 2001. 163-170
  • 3Wang Hui, Li Qinghua, Ma Chuanxiang, Li Kenli. A Maximal Frequent Itemset Algorithm. Lecture Notes in Computer Science,Springer, 2003,2639: 484-490
  • 4Agrawal R, Shafer J C. Parallel Mining of Association Rules.IEEE Transaction On Knowledge And Data Engineering, Dec.1996,8(6): 962-969
  • 5Zaki M J, Parthasarathy S, Ogihara M, Li Wei. New Parallel Algorithms for Fast Discovery of Association Rules. Data Mining and Knowledge Discovery: An International Journal. special issue on Scalable High-Performance Computing for KDD, Dec. 1997, 1(4) :34
  • 6Zaki M J. Parallel and Distributed Association Mining: A Survey.IEEE Concurrency, 1999,7(4): 14-25
  • 7路松峰,卢正鼎.快速开采最大频繁项目集[J].软件学报,2001,12(2):293-297. 被引量:113

共引文献165

同被引文献11

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部