期刊文献+

最大频繁项集挖掘中搜索空间的剪枝策略 被引量:5

Pruning strategy for mining maximal frequent itemsets
原文传递
导出
摘要 最大频繁项集挖掘可以广泛应用在多种重要的Web挖掘工作中.为了有效地削减搜索空间,提出了一种新的最大频繁项集挖掘中的搜索空间剪枝策略.这种策略基于深度优先遍历词典序子集枚举树,利用树中子节点与父节点扩展集中相同项的扩展支持度相等的特性,对搜索空间进行剪枝.应用该策略,对MAFIA算法进行改进优化.实验结果表明,该剪枝策略可以有效削减搜索空间,尤其在稀疏但包含长频繁项集的数据集上,搜索空间削减掉2/3,算法的时间效率比原MAFIA算法提高3~5倍. Mining maximal frequent itemsets is a fundamental problem in many practical web mining applications. This paper presents ESEquivPS extension support equivalency pruning strategy, a new search space pruning strategy for mining maximal frequent itemsets to effectively reduce the search space. ESEquivPS was based on a depthfirst traversal of lexicographic subset enumeration tree and uses equivalency of item s extension supports to prune search space. Furthermore, the MAFIA maximal frequent itemset algorithm wa...
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2005年第S1期1748-1752,共5页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金资助项目(60473095)
关键词 WEB挖掘 最大频繁项集 剪枝策略 搜索空间 web mining maximal frequent itemsets pruning strategy search space
  • 相关文献

参考文献10

  • 1Bayardo R.Efficiently mining long patterns from databases[].Proceedings of the ACM SIGMOD International Conference on Management of Data.1998
  • 2Zaki M J,Gouda K.Fast vertical mining using diffsets[].Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2003
  • 3LIN D I,Kedem Z M.Pincer-search: A new algorithm for discovering the maximum frequent set[].th Intl Conf Extending Database Technology.1998
  • 4WANG J,HAN J,PEI J.CLOSET +: Searching for the best strategies for mining frequent closed itemsets[].Proceedings of the ACM SIGMOD Int Conf on Management of Data.2003
  • 5Burdick D,Calimlim M,Gehrke J.MAFIA: A maximal frequent itemset algorithm for transactional databases[].Proceedings of the th Conference on Data Eng.2001
  • 6HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[].Proceedings of the ACM SIGMOD Int Conf on Management of Data.2000
  • 7Zaki M J,Hsiao C.Efficient algorithm for mining closed itemsets and their lattice structure[].IEEE Transactions on Knowledge and Data Engineering.2005
  • 8Rymon R.Search through systematic set enumeration[].Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning.1992
  • 9Agarwal R,Aggarwal C,Prasad V.Depth first generation of long patterns[].Proceedings of the ACM SIGMOD Int Conf on Management of Data.2000
  • 10Goethals B,Zaki M J.Advances in frequent itemset mining implementations[].A CM SIGKDD Explorations Newsletter.2004

同被引文献37

  • 1颜跃进,李舟军,陈火旺.一种挖掘最大频繁项集的深度优先算法[J].计算机研究与发展,2005,42(3):462-467. 被引量:20
  • 2陈基漓,牛秦洲.基于特征码的网页去重[J].微计算机信息,2006,22(03X):113-115. 被引量:11
  • 3郑冬冬,崔志明.Deep Web爬虫爬行策略研究[J].计算机工程与设计,2006,27(17):3154-3158. 被引量:13
  • 4LiuBing.Web数据挖掘[M].北京:清华大学出版社,2009.
  • 5Bright Planet. The deep web-surfacing the hidden value[EB/OL]. [ 2010-10-25 ]. http://www. brightplanet. com/the-deep-web/about-the-deep-web/.
  • 6Cindy Xide Lin, Bo Zhao, Tim Weninger, et al. Entity Relation Discovery from Web Tables and Links [ C ]// Proceedings of WWW Conference 2010. New York: ACM Press, 2010: 1145-1146.
  • 7李贵 冯季肪 韩子扬 等.基于表格特征的Web数据抽取方法.计算机科学,2009,:285-287.
  • 8王冉冉,王刚,黄青松.基于Deep Web的信息采集系统[J].计算机技术与发展,2007,17(10):171-173. 被引量:3
  • 9YANG Kai, MA Yuan. A fast algorithm for discovering maximum frequent itemsets[C]//Proc of the 21 th Int'l Conf on Communication Software and Networks. Xi'an, China, 2011: 434-438.
  • 10HUANG Guoyang, WANG Libo, HU Changzhen, et al. An efficient algorithm based on time decay model for mining maximal frequent itemsets[C]//Proc of the 20th Int'l Conf on Machine Learning and Cybernetics. Perth, Australia, 2009: 2063 -2066.

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部