新型频繁项集快速挖掘模式树的方法

Research on new mining algorithm of frequent itemset

下载PDF

导出

摘要在FP_growth算法中,FP_tree及条件FP_tree的构造和遍历占了算法绝大部分的时间,为了能减少这方面的时间,提出了一种新型快速的方法——改进的层次频繁模式树(inproved hierarchy FP_tree,IHFP_tree)。该方法采用首先对数据库扫描一遍,产生每个项的等价类;然后去掉不频繁项,对等价类进行重新改写;最后再创建FP_tree。引入层次频繁模式的概念,在挖掘过程中大大提高了算法的时空效率。与其他频繁模式挖掘的常用算法进行了时间复杂度和空间复杂度的比较,实验表明,IHFP_tree的挖掘速度比FP_tree方法要快得多。 In FP-growth algorithm, it costs most of the time in constructing and traversing the FP-tree and conditional FP-tree. In order to constructing the FP_tree efficiently, this paper proposed a new fast algorithm called inproved hierarchy FP_tree （abbreviate IHFP_tree）. The algorithm firstly scaned the database only once for generating equivalence classes of each item. Then deleted the non-frequent items and rewrote the equivalence classes of the frequent items, and then constructed the IH FP_tree.

作者王静红刘丽娜耿宗科

机构地区河北师范大学信息技术学院河北农业大学

出处《计算机应用研究》 CSCD 北大核心 2008年第8期2325-2327,共3页 Application Research of Computers

基金国家自然科学基金资助项目(60675014) 河北省科技厅资助项目(042135126) 河北省教育厅自然科学基金资助项目(2007474)

关键词 FP_tree IHFP_tree 频繁模式等价类 FP_tree IHFP_tree frequent pattern equivalence class

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献10

1AGRAWAL R,SRIKANT R.Fast algorithms for mining association rules[C]// BOCCA B,JARKE M,ZANIOLO C.Proc of the 20th International Conference on Very Large Data Bases.San Francisco:Morgan Kaufmann Publishers,1994:487-499.
2AGRAWAL R,SRIKANT R.Mining sequential patterns[C]// YU P,CHEN A.Proc of the 11th International Conference on Data Engineering.Taipei:IEEE Computer Society Press,1995:3-14.
3SRIKANT R,AGRAWAL R.Mining sequential patterns:generalizations and performance improvements[C]//APERS P,BOUZEGHOUB M,GARDARIN G.Proc of the 5th International Conference on Extending Database Technology.London:Springer-Verlag,1996:3-17.
4MASSEGLIA F,CATHALA F,PONCELET P.The PSP approach for mining sequential patterns[C]//ZYTKOW J,QUAFAFOU M.Proc of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery.London:Springer-Verlag,1998:176-184.
5HAN Jia-wei,PEI Jian,MORTAZAVI-ASL B,et al.FreeSpan:frequent pattern-projected sequential pattern mining[C]//RAMAKRISHNAN R,STOLFO S,BAYARDO R,et al.Proc of the 6th ACM SIGKDD International Corference on Knowledge Discovery and Data Mining.New York:ACM Press,2000:355-359.
6PEI Jian,HAN Jia-wei,MORTAZAVI-ASL B.PrefixSpan:mining sequential patterns efficiently by prefix-projected pattern growth[C]// BUCHMANN A,GEORGAKOPOULOS D.Proc of the 17th International Conterence on Data Engineering.Washington DC:IEEE Computer Society,2001:215-224.
7HAN Jia-wei,PEN Jian,YIN Yi-wen.Mining frequent patterns without candidate generation[C]// Proc of ACM SIMOD International Conference on Management of Data.New York:ACM Press,2000:135-143.
8GRAHNE G,ZHU Jian-fei.High performance mining of maximal frequent itemsets[C]// Proc of the 6th SIAM International Workshop on High Performance Data Mining.2003:885-887.
9PASQUIER N,BASTIDE Y,TAOUIL R,et al.Discovering frequent closed itemsets for association rules[C]// Proc of the 7th Internatio-nal Conference on Database Theory.London:Springer-Verlag,1999:398-416.
10IBM Almaden Research Center.Quest synthetic data generation[EB/OL].(2005).http://www.almaden.ibm.com/software/quest/resources/datasets/syndata.htnl.

1王静红,刘教民,郭盛,孙亚非.一种新型快速建立频繁模式树的方法[J].计算机应用,2008,28(3):735-737. 被引量：2
2余翠兰.PFP-CM算法及其在Matlab中的实现[J].软件导刊,2014,13(6):32-35.
3庹文利,姚勇.基于FP_tree的最大频繁项目集增量式更新算法[J].计算机工程与应用,2009,45(19):117-119. 被引量：2
4赵岩,姚勇,刘志镜.基于FP_tree的频繁项目集增量式更新算法[J].计算机工程,2008,34(11):63-65. 被引量：5
5孙林,宋国杰,张培衢,高亚萍.一个新的数据挖掘模型与算法[J].计算机应用研究,2001,18(2):43-45. 被引量：1
6欧阳继红,王仲佳,刘大有.具有动态加权特性的关联规则算法[J].吉林大学学报（理学版）,2005,43(3):314-319. 被引量：16
7王立军,宋余庆,谢从华,吕颖.基于二叉频繁模式树的医学图像关联规则挖掘[J].计算机工程与应用,2006,42(13):182-184. 被引量：3
8叶福兰.基于FP_tree的最大频繁模式挖掘算法的改进[J].成都大学学报（自然科学版）,2014,33(2):148-150. 被引量：4
9陈红叶.增量式FP_Growth算法及在信息抽取上的应用[J].制造业自动化,2011,33(2):57-59. 被引量：1
10吕锋,陈华胜.关联算法的改进及其在审计数据挖掘中的应用[J].武汉理工大学学报（信息与管理工程版）,2004,26(5):5-9. 被引量：2

计算机应用研究

2008年第8期

浏览历史

内容加载中请稍等...

新型频繁项集快速挖掘模式树的方法

参考文献10

相关作者

相关机构

相关主题

浏览历史