期刊文献+

基于生成子的频繁项集聚类算法

Algorithm for clustering frequent itemsets based on generators
下载PDF
导出
摘要 如何有效地约简频繁项集的数量是目前数据挖掘研究的热点。对频繁项集进行聚类是该问题的解决方法之一。由于生成子是全体频繁项集的无损精简表示,故对生成子进行聚类与对全体频繁项集进行聚类具有相同的效果。提出了一种基于生成子的频繁项集聚类算法。首先,利用最小描述长度原理,讨论了选择生成子进行聚类的合理性;其次,给出了生成子的剪枝策略及挖掘算法;最后,在一种新的项集相似性的度量标准的基础上,给生成子的聚类算法。实验结果表明,该方法可有效地减少项集的数量,并具有较高的挖掘效率。 How to reduce the number of frequent itemsets effectively is a hot topic in data mining research.Clustering frequent itemsets is one solution to the problem.Since generators are lossless concise representations of all frequent itemsets,clustering generators is equivalent to clustering all frequent itemsets.A new algorithm for clustering frequent itemsets based on generators is proposed.Firstly,based on minimum description length principle,the rationality of clustering generators is discussed.Secondly,the pruning strategies and mining algorithm for generators are proposed.Finally,based on a new similarity criterion of frequent itemsets,the clustering algorithm is presented.Experimental results show that the proposed method can not only reduce the number of discovered itemsets,but also is efficient.
出处 《计算机工程与应用》 CSCD 北大核心 2008年第35期5-8,共4页 Computer Engineering and Applications
基金 国家自然科学基金No.60675030 北京市属市管高等学校人才强教计划项目~~
关键词 数据挖掘 生成子 聚类 data mining generator clustering
  • 相关文献

参考文献12

  • 1Han J,Cheng H,Xin D,et al.Frequent pattern mining:current status and future direetions[J].Data Mining and Knowledge Discovery,2007, 15(1 ):55-86.
  • 2Lian W,Cheung D W,Yiu S M.Maintenance of maximal frequent itemsets in large databases[C]//Proceedings of the 2007 ACM Symposium on Applied Computing,Seoul, Korea,2007.
  • 3唐瑜,王勇,杨辉华.挖掘最大频繁项集的优化方法[J].计算机工程与应用,2006,42(31):171-173. 被引量:5
  • 4Pasquier N, Bastide Y, Taouil R, et al.Discovering frequent closed itemsets for association rules[C]//Proceedings of the 7th International Conference on Database Theory,Jerusalem,Israel, 1999.
  • 5Liu G,Lu H,Lou W,et al.Efficient mining of frequent patterns using ascending frequency ordered prefix-tree[J].Data Mining and Knowledge Discovery, 2004,9(3 ) : 249-274.
  • 6Xiong H,Steinbach M,Tan P,et al.HICAP:Hierarchical clustering with pattern preservation[C]//Proceedings of the 4th SIAM International Conference on Data Mining,Lake Buena Vista,Florida,USA, 2004.
  • 7Jea K F,Chang M Y.Discovering frequent itemsets by support approximation and itemset clustering[J].Data & Knowledge Engineering, 2008,65 ( 1 ) : 90-107.
  • 8Li Y,Chung S M,Holt J D.Text document clustering based on frequent word meaning sequences[J].Data & Knowledge Engineering, 2008,64( 1 ) :381-404.
  • 9黄东,唐俊,汪卫,施伯乐.CuMen:基于最大频繁序列模式的聚类算法及其在基因拼接中的应用[J].计算机科学,2005,32(10):149-153. 被引量:4
  • 10Kryszkiewicz M,Skonieczny L.Hierarchical document clustering using frequent closed sets[C]//Klopotek M A,Wierzchon S T,Trojanowski K.Intelligent Information Processing and Web Mining. Berlin: Springer, 2006 : 489-498.

二级参考文献18

  • 1周焕银,张永,蔺鹏.一种不产生候选项挖掘频繁项集的新算法[J].计算机工程与应用,2004,40(15):182-185. 被引量:14
  • 2秦吉胜,宋瀚涛.关联规则挖掘AprioriHybrid算法的研究和改进[J].计算机工程,2004,30(17):7-8. 被引量:10
  • 3徐章艳,刘美玲,张师超,卢景丽,区玉明.Apriori算法的三种优化方法[J].计算机工程与应用,2004,40(36):190-192. 被引量:71
  • 4Jian P, Han JW, Morta-zavi-Asl B, et al. Mining Sequential Patterns by Prefix-Projected Growth. ICDE, 2001. 215~224
  • 5Foster I,Kesselman C. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1998
  • 6OGSA(Open Grid Services Architecture) Documents. http:∥www. globus. org/ogsa
  • 7Globus: Research in Resource Management. http:∥ www. globus. org/research/
  • 8Foster I, Kesselman C. The globus project: A status report. In:Proc. The Heterogeneous Computing Workshop, 1998. 4~18
  • 9Mullikin J C,Ning Z. The Phusion Assembler. Genome Research,2003,13(1) :81~90
  • 10Wang JY, Han JW. BIDE: Efficient Mining of Frequent Closed Sequences. In: 20 Intl. Conf. on Date Engineering

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部