期刊文献+

在FP-树中挖掘频繁模式而不生成条件FP-树 被引量:56

Mining Frequent Patterns in an FP-tree Without Conditional FP-tree Generation
下载PDF
导出
摘要 FP growth算法是目前已发表的最有效的频繁模式挖掘算法之一 然而 ,由于在挖掘频繁模式时需要递归地生成大量的条件FP 树 ,其时空效率仍然不够高 改进了FP 树结构 ,提出了一种基于被约束子树挖掘频繁项集的有效算法 改进的FP 树是单向的 ,每个结点只保留指向父结点的指针 ,这大约节省了三分之一的树空间 通过引入被约束子树(可以用 3个很小的数组表示 ) ,算法在挖掘频繁模式时不生成条件FP 树 ,从而大大提高了频繁模式挖掘的时空效率 实验表明 ,与FP growth算法相比 ,算法的挖掘速度提高了 1倍以上 ,而所需的存储空间减少了一半 此外 ,随着数据库规模的增大 ,算法具有很好的可伸缩性 对于稠密数据集 ,算法也具有良好的性能 . FP-growth algorithm is one of the most efficient frequent pattern mining methods published recently. However, FP-growth algorithm must generate a huge number of conditional FP-trees recursively in process of mining, so the efficiency of FP-growth remains unsatisfactory. In this paper, the structure of a traditional FP-tree is improved and an efficient frequent pattern-mining algorithm based on constrained sub-tree is proposed. The new FP-tree is a one-way tree and there is no pointers to point its children in each node, so at least one third of memory is saved compared with the former structure in the same storage of frequent pattern information. By introducing constrained sub-tree (consisting of three small arrays), the proposed algorithm doesn't generate conditional FP-trees in mining process and therefore greatly improves the mining efficiency in both time and space. Experiments show that in comparison with FP-growth, this algorithm has accelerated the mining speed by at least two times and reduced the space consumption by half. Moreover, the algorithm has a very good time and space scalability with the number of transactions, and has an excellent performance in dense database mining as well.
作者 范明 李川
出处 《计算机研究与发展》 EI CSCD 北大核心 2003年第8期1216-1222,共7页 Journal of Computer Research and Development
基金 河南省自然科学基金 ( 0 1110 60 70 0 )
关键词 数据挖掘 频繁模式 FP-树 data mining frequent pattern FP-tree
  • 相关文献

参考文献8

  • 1范明 等.数据挖掘:概念与技术[M].北京:机械工业出版社,2001.8.
  • 2R Agrawal, R Srikant. Fast algorithms for mining association rules. In: Proc of 1994 Int'l Conf on Very Large Data Bases.Santiago, Chili: VLDB Endowment, 1994. 487--499.
  • 3J S Park, M S Chen, P S Yu. An effective Hash-based algorithm for mining association rules. In: Proc of 1995 ACM-SIGMOD Int'l Cord on Management of Data. San Jose, CA: ACM Press,1995. 175--186.
  • 4S Brin, R Motwani, C Silvemtein. Beyond market basket:Generalizing association rules to correlations. In: Proe of 1997 ACM-SIGMOD Int'l Conf on Management of Data. Tucson, AZ:ACM Press, 1997. 265--276.
  • 5R Agrawal, R Srikant. Mining sequential patterns. In: ICDE'95. Taipei, Taiwan: IEEE Computer Society Press, 1995. 3--14.
  • 6G Dong, J Li. Efficient mining of emerging patterns: Discovering trends and differences. In: Proc of the 5th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining. San Diego, CA:ACM Press, 1999. 43~52.
  • 7J Han, J Pei, Y Yin. Mining frequent patterns without candidate generation. In: Proe of 2000 ACM-SIGMOD Int'l Conf on Management of Data. Dallas, TX: ACM Press, 2000. 1--12.
  • 8Artur Bykowski, Christophe Rigotti. A eondemsed representation to find frequent patterns. In: Proe of the 20th ACM SIGACT-SIGMOD-SIGART Symp on Principles of Database Systems(PODS 2001). Santa Barbara, CA: ACM Press, 2001. 267~273.

共引文献10

同被引文献415

引证文献56

二级引证文献190

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部