最大频繁项集挖掘中搜索空间的剪枝策略被引量：5

Pruning strategy for mining maximal frequent itemsets

导出

摘要最大频繁项集挖掘可以广泛应用在多种重要的Web挖掘工作中.为了有效地削减搜索空间,提出了一种新的最大频繁项集挖掘中的搜索空间剪枝策略.这种策略基于深度优先遍历词典序子集枚举树,利用树中子节点与父节点扩展集中相同项的扩展支持度相等的特性,对搜索空间进行剪枝.应用该策略,对MAFIA算法进行改进优化.实验结果表明,该剪枝策略可以有效削减搜索空间,尤其在稀疏但包含长频繁项集的数据集上,搜索空间削减掉2/3,算法的时间效率比原MAFIA算法提高3～5倍. Mining maximal frequent itemsets is a fundamental problem in many practical web mining applications. This paper presents ESEquivPS extension support equivalency pruning strategy, a new search space pruning strategy for mining maximal frequent itemsets to effectively reduce the search space. ESEquivPS was based on a depthfirst traversal of lexicographic subset enumeration tree and uses equivalency of item s extension supports to prune search space. Furthermore, the MAFIA maximal frequent itemset algorithm wa...

作者马志新陈晓云王雪李龙杰

机构地区兰州大学信息科学与工程学院

出处《清华大学学报（自然科学版）》 EI CAS CSCD 北大核心 2005年第S1期1748-1752,共5页 Journal of Tsinghua University(Science and Technology)

基金国家自然科学基金资助项目(60473095)

关键词 WEB挖掘最大频繁项集剪枝策略搜索空间 web mining maximal frequent itemsets pruning strategy search space

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Bayardo R.Efficiently mining long patterns from databases[].Proceedings of the ACM SIGMOD International Conference on Management of Data.1998
2Zaki M J,Gouda K.Fast vertical mining using diffsets[].Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2003
3LIN D I,Kedem Z M.Pincer-search: A new algorithm for discovering the maximum frequent set[].th Intl Conf Extending Database Technology.1998
4WANG J,HAN J,PEI J.CLOSET +: Searching for the best strategies for mining frequent closed itemsets[].Proceedings of the ACM SIGMOD Int Conf on Management of Data.2003
5Burdick D,Calimlim M,Gehrke J.MAFIA: A maximal frequent itemset algorithm for transactional databases[].Proceedings of the th Conference on Data Eng.2001
6HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[].Proceedings of the ACM SIGMOD Int Conf on Management of Data.2000
7Zaki M J,Hsiao C.Efficient algorithm for mining closed itemsets and their lattice structure[].IEEE Transactions on Knowledge and Data Engineering.2005
8Rymon R.Search through systematic set enumeration[].Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning.1992
9Agarwal R,Aggarwal C,Prasad V.Depth first generation of long patterns[].Proceedings of the ACM SIGMOD Int Conf on Management of Data.2000
10Goethals B,Zaki M J.Advances in frequent itemset mining implementations[].A CM SIGKDD Explorations Newsletter.2004

同被引文献37

1颜跃进,李舟军,陈火旺.一种挖掘最大频繁项集的深度优先算法[J].计算机研究与发展,2005,42(3):462-467. 被引量：20
2陈基漓,牛秦洲.基于特征码的网页去重[J].微计算机信息,2006,22(03X):113-115. 被引量：11
3郑冬冬,崔志明.Deep Web爬虫爬行策略研究[J].计算机工程与设计,2006,27(17):3154-3158. 被引量：13
4LiuBing.Web数据挖掘[M].北京:清华大学出版社,2009.
5Bright Planet. The deep web-surfacing the hidden value[EB/OL]. [ 2010-10-25 ]. http://www. brightplanet. com/the-deep-web/about-the-deep-web/.
6Cindy Xide Lin, Bo Zhao, Tim Weninger, et al. Entity Relation Discovery from Web Tables and Links [ C ]// Proceedings of WWW Conference 2010. New York: ACM Press, 2010: 1145-1146.
7李贵冯季肪韩子扬等.基于表格特征的Web数据抽取方法.计算机科学,2009,:285-287.
8王冉冉,王刚,黄青松.基于Deep Web的信息采集系统[J].计算机技术与发展,2007,17(10):171-173. 被引量：3
9YANG Kai, MA Yuan. A fast algorithm for discovering maximum frequent itemsets[C]//Proc of the 21 th Int'l Conf on Communication Software and Networks. Xi'an, China, 2011: 434-438.
10HUANG Guoyang, WANG Libo, HU Changzhen, et al. An efficient algorithm based on time decay model for mining maximal frequent itemsets[C]//Proc of the 20th Int'l Conf on Machine Learning and Cybernetics. Perth, Australia, 2009: 2063 -2066.

引证文献5

1白承森,马志新,徐玉生.一种基于ESEquivPS的封闭频繁项集挖掘算法[J].计算机工程与科学,2009,31(1):151-153.
2李贵,韩子扬,郑新录,李征宇.基于Apriori算法的Deep Web网页关系挖掘研究[J].山东大学学报（理学版）,2011,46(5):67-70.
3王春华,宁慧,邹韵,郭江鸿.基于图的四叉链表存储结构的最大频繁项集挖掘算法[J].应用科技,2013,40(1):76-79.
4张志刚,黄刘生,金宗安,项莉萍.基于父子等价剪枝策略的最大频繁项集挖掘[J].计算机工程,2013,39(4):219-221. 被引量：3
5张心静,于嘉威,王红梅.基于回溯的最大频繁项集挖掘算法[J].电子科技,2016,29(8):78-81.

二级引证文献3

1赵官宝,刘云.一种基于位表的有效频繁项集挖掘算法[J].山东大学学报（理学版）,2015,50(5):23-29. 被引量：4
2林晨,顾君忠.基于Nodeset的最大频繁项集挖掘算法[J].计算机工程,2016,42(12):204-207. 被引量：6
3顾军华,苏鸣,张亚娟,张丹红.基于位编码链表的快速频繁模式挖掘算法研究[J].计算机工程与应用,2020,56(19):86-93. 被引量：3

1陈晨.最大频繁项集挖掘算法综述[J].电脑知识与技术,2008,0(11Z):1030-1031.
2黄松英.基于最大频繁项集挖掘的入侵检测研究[J].绍兴文理学院学报,2007,27(10):32-36. 被引量：1
3邓忠军,宋威,郑雪峰,王少杰.P2P网络中最大频繁项集挖掘算法研究[J].计算机应用研究,2010,27(9):3490-3492. 被引量：1
4彭慧伶,舒云星,武新.基于FP-tree的最大频繁项集挖掘新算法[J].计算技术与自动化,2009,28(2):62-65.
5陈凤娟.基于FP树的最大频繁项集挖掘[J].电子世界,2014(17):119-119.
6陈慧萍,王建东,王煜.频繁项集挖掘的研究与进展[J].计算机仿真,2006,23(4):68-73. 被引量：10
7张志刚,黄刘生,金宗安,项莉萍.基于父子等价剪枝策略的最大频繁项集挖掘[J].计算机工程,2013,39(4):219-221. 被引量：3
8张世玲,李艳,王熙腾.一种基于布尔矩阵的最大频繁项集挖掘算法[J].计算机光盘软件与应用,2013,16(1):192-193. 被引量：1
9刘琰,张进,陈静,尹美娟,张伟丽.基于最大频繁项集挖掘的微博炒作群体发现方法[J].计算机工程与应用,2017,53(4):90-97.
10刘慧婷,候明利,赵鹏,姚晟.不确定数据流最大频繁项集挖掘算法研究[J].计算机工程与应用,2016,52(19):72-77. 被引量：9

清华大学学报（自然科学版）

2005年第S1期

浏览历史

内容加载中请稍等...

最大频繁项集挖掘中搜索空间的剪枝策略被引量：5

参考文献10

同被引文献37

引证文献5

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

最大频繁项集挖掘中搜索空间的剪枝策略 被引量：5

参考文献10

同被引文献37

引证文献5

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

最大频繁项集挖掘中搜索空间的剪枝策略被引量：5