交集剪枝法挖掘最大频繁项集被引量：1

Algorithm for mining maximum frequent itemset based on intersection pruning

下载PDF

导出

摘要发现最大频繁项目集是数据挖掘应用中的关键问题;为寻求避免生成大量的候选项集,或生成频繁模式树的挖掘算法,提出一种从事务项集对应的最大频繁项集求全部属性项集的最大频繁项集的新算法IPA(Intersection Pruning Algorithm)。该算法通过交集剪枝实现自顶向下和自底向上的搜索最大频繁项集,并使用属性项的分布数据和已生成的交集等多种信息来减少求交集的次数;该算法最多只用求(1-最小支持度)×|D|+1个事务项集和其他事务项集的交集,从而可有效降低算法的时间复杂度;实验表明该算法有效可行,并且该算法易于实现。 Discovering maximal frequent itemset is a key issue in data mining;to look for an algorithm that can avoid the generating of vast volume of candidate itemsets,or the generating of frequent pattern tree,an intersection pruning algorithm（IPA） is proposed to find the maximum frequent sets for itemset of all properties from the maximum frequent itemset for transaction itemset.h combines a top-down and bottom-up searches for maximum frequent itemset through intersection pruning,and uses the distribution data of properties and information of the generated intersections,etc, to reduce the number of intersects.Up to （1-minimum support）x｜D｜＋l intersections are calculated,so the time complexity of this algorithm is relatively low;experiments show that this algorithm is valid and efficient,and it is also easy in coding for use in KDD applications.

作者王乐王水陈波董鹏

机构地区大连大学信息工程学院南阳理工学院软件学院

出处《计算机工程与应用》 CSCD 北大核心 2009年第13期156-159,共4页 Computer Engineering and Applications

基金辽宁信息科学与工程重点实验室开放课题No.2005003 2008年大连市IT优秀教师科研基金~~

关键词数据挖掘最大频繁项集候选项集交集剪枝 data mining maximum frequent itemsets candidate itemsets intersection pruning

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1Ceglar A,Roddick J F.Association mining[J].A CM Computing Surveys,2006,38(2):1-42.
2Rigoutaos L,Floratos A.Combinatorial pattern discovery in bio-logical sequences:The teiresias algorithm[J].Bioinformatics,1998,14 (1):55-67.
3Bayardo R J.Efficiently mining long patterns from databases[C]//Haas L M,Tiwary A.Proceedings ACM SIGMOD International Conference on Management of Data,1998:85-93.
4Lin D I,Kedem Z M.Pincer-aearch:A now algorithm for discovering the maximum frequent set[C]//Schek H J.Proceedings of 6th International Conference on Extending Database Technology,1998:105-119.
5Agarwal R C,Aggarwal C C,Prasad V V V.Depth first generation of long patterns[C]//Ramakrishnan R,Stolfo S.Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000:108-118.
6Burdiek D,Calimlim M,Gehrke J.MAFIA:A maximal frequent itemset algorithm for transactional databases[C]//Georgakopoulos D.Proceedings of the 17th International Conference on Data Engineering,2001-443-452.
7Gouda K,Zaki M J.Efficiently mining maximal frequent itemsets[C]//Cercone N,Lin T Y,Wu X D.Proceedings of the 2001 IEEE International Conference on Data Mining,2001:163-170.
8宋余庆,朱玉全,孙志挥,陈耿.基于FP-Tree的最大频繁项目集挖掘及更新算法[J].软件学报,2003,14(9):1586-1592. 被引量：164
9李庆华,王卉,蒋盛益.挖掘最大频繁项集的并行算法[J].计算机科学,2004,31(12):132-134. 被引量：5

二级参考文献7

1Burdick D, Calimlim M, Gehrke J. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In: Proc.of 17th Intl. Conf. on Data Engineering, Heidelberg, Germany,April 2001. 443-452
2Gouda K, Zaki M J. Efficiently Mining Maximal Frequent Itemsets. In: Proc. of 2001 IEEE Intl. Conf. on Data Mining (ICDM'01), San Jose, California, November 2001. 163-170
3Wang Hui, Li Qinghua, Ma Chuanxiang, Li Kenli. A Maximal Frequent Itemset Algorithm. Lecture Notes in Computer Science,Springer, 2003,2639: 484-490
4Agrawal R, Shafer J C. Parallel Mining of Association Rules.IEEE Transaction On Knowledge And Data Engineering, Dec.1996,8(6): 962-969
5Zaki M J, Parthasarathy S, Ogihara M, Li Wei. New Parallel Algorithms for Fast Discovery of Association Rules. Data Mining and Knowledge Discovery: An International Journal. special issue on Scalable High-Performance Computing for KDD, Dec. 1997, 1(4) :34
6Zaki M J. Parallel and Distributed Association Mining: A Survey.IEEE Concurrency, 1999,7(4): 14-25
7路松峰,卢正鼎.快速开采最大频繁项目集[J].软件学报,2001,12(2):293-297. 被引量：113

共引文献165

1谢志强,朱孟杰,杨静.基于改进FP-树的最大项目集挖掘算法[J].计算机应用研究,2009,26(2):502-505. 被引量：1
2姜晗,贾泂.基于标记域FP-Tree快速挖掘最大频繁项集[J].计算机研究与发展,2007,44(z2):334-349. 被引量：4
3杨种学.基于并行FP-growth算法挖掘网上关联交易规则[J].南京晓庄学院学报,2005,21(5):65-70.
4王盛,董黎刚,李群.一种基于逆序编码的关联规则挖掘研究[J].杭州电子科技大学学报（自然科学版）,2010,30(5):169-172. 被引量：1
5朱玉全,宋余庆,陈耿.约束最大频繁项目集的增量式更新算法[J].计算机工程,2004,30(18):31-32.
6杨君锐,赵群礼.一种不产生候选集的最大频繁集快速挖掘算法[J].微电子学与计算机,2004,21(11):125-128. 被引量：4
7张莹,韩芳溪,柴乔林.基于频繁模式树的AOI聚类算法[J].计算机工程与应用,2004,40(35):178-179.
8李清峰,杨路明,张晓峰.关联规则中最大频繁项目集的研究[J].计算机应用研究,2005,22(1):93-95. 被引量：3
9吉根林,杨明,宋余庆,孙志挥.最大频繁项目集的快速更新[J].计算机学报,2005,28(1):128-135. 被引量：47
10陈天敏,姜丽红.基于预制数据库的FP-tree构造算法[J].计算机工程,2004,30(B12):58-61.

同被引文献11

1胡吉明,鲜学丰.挖掘关联规则中Apriori算法的研究与改进[J].计算机技术与发展,2006,16(4):99-101. 被引量：59
2Juliseh K. Data mining for intrusion detection: A critical review[ C ]//Proc of Applications of Data Mining in Computer Security. Boston: Kluwer Academic Publisher, 2002.
3Han J, Pei J, YJn J. Mining frequent patterns without candidate generation [ C ]//Proceedings of ACM SIGMOD ICMD. [ s. l. ] :ACM Press ,2000 : 11 -12.
4Wen Lei, Li Minqiang. A new association rules mining algorithms based on directed itemsets graph [ C ]///Proceeding of 9th Int'l Conf,RSFDGrc. [s. l. ] :[s. n. ] ,2003:660-663.
5HanJiawei MichelineKambe.数据挖掘概念与技术[M].北京：机械工业出版社,2001..
6郭有强.一种高效的关联规则维护算法研究与实现[J].计算机技术与发展,2007,17(10):123-126. 被引量：6
7袁万莲,郑诚,翟明清.一种改进的Apriori算法[J].计算机技术与发展,2008,18(5):51-53. 被引量：19
8黄建明,赵文静,王星星.基于十字链表的Apriori改进算法[J].计算机工程,2009,35(2):37-38. 被引量：25
9剧立伟,姜浩,蒲安建.Web Service架构下的分布式关联规则挖掘研究[J].计算机技术与发展,2009,19(4):31-34. 被引量：6
10陈伟.Apriori算法的优化方法[J].计算机技术与发展,2009,19(6):80-83. 被引量：8

引证文献1

1周丽,王小玲.基于网络审计日志关联规则挖掘的改进[J].计算机技术与发展,2011,21(6):150-153. 被引量：4

二级引证文献4

1李艺夫,马增帮.基于数据流的分域部署核查工具设计与实现[J].信息网络安全,2012(9):73-75.
2邵国林,陈兴蜀,尹学渊,叶晓鸣.基于流量结构稳定性的服务器网络行为描述:建模与系统[J].电子科技大学学报,2017,46(1):102-108. 被引量：5
3卢杰骅.基于数据挖掘的网络安全审计技术研究[J].电脑知识与技术（过刊）,2011,17(12X):9050-9051. 被引量：2
4张莉,郑羽.UAAE应用识别技术在网络审计系统上的应用[J].电脑知识与技术,2014,10(8X):5638-5641.

1刘黎明,王水,王乐.基于迭代事务集与交集剪枝的最大频繁项集挖掘算法[J].南开大学学报（自然科学版）,2009,42(4):97-102. 被引量：3
2曾波.一种基于单事务项集组合的频繁项集挖掘算法[J].计算机科学,2008,35(1):196-197. 被引量：4
3黎远松.基于VS2008的分组验证功能实现[J].电脑编程技巧与维护,2010(10):11-12. 被引量：1
4王旭仁,许榕生.基于粗糙集理论的关联规则挖掘研究及应用[J].计算机工程,2005,31(20):90-92. 被引量：10
5郭宁,孙晓妍,林和,牟华.基于属性序约简的恶意代码检测[J].计算机应用,2011,31(4):1006-1009. 被引量：4
6王志言,刘椿年.一种路径覆盖的自动生成算法──剪枝法[J].计算机研究与发展,1998,35(2):169-172. 被引量：1
7Runpeng Gao,Ye San.Improved adaptive pruning algorithm for least squares support vector regression[J].Journal of Systems Engineering and Electronics,2012,23(3):438-444. 被引量：4
8许晓剑,田康生,范锦勇,李鹏.基于极大似然估计的最优航迹关联算法[J].空军雷达学院学报,2010(1):22-24. 被引量：3
9高宏宾,潘谷,黄义明.基于频繁项集特性的Apriori算法的改进[J].计算机工程与设计,2007,28(10):2273-2275. 被引量：25
10周兴斌,迟殿委.一种Apriori算法的改进[J].南昌大学学报（工科版）,2008,30(2):184-187. 被引量：3

计算机工程与应用

2009年第13期

浏览历史

内容加载中请稍等...

交集剪枝法挖掘最大频繁项集被引量：1

参考文献9

二级参考文献7

共引文献165

同被引文献11

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

交集剪枝法挖掘最大频繁项集 被引量：1

参考文献9

二级参考文献7

共引文献165

同被引文献11

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

交集剪枝法挖掘最大频繁项集被引量：1