极大频繁子树挖掘及其应用被引量：4

Maximum Frequent Tree Mining and its Applications

下载PDF

导出

摘要极大频繁子树挖掘在Web挖掘、HTML/XML文档分析、生物医学信息处理等领域有着重要的应用,可用于解决这些领域的自同构问题。本文提出了一种极大频繁子树挖掘算法(MFTM)。MFTM基于最右路径扩展技术,在搜索过程中,采用覆盖定理进行裁剪,压缩搜索空间,从而极大地加快了算法的收敛速度。性能实验表明,极大频繁挖掘等算法是有效和可伸缩的。 A novel algorithm called Maximum Frequent Tree Mining （MFTM） is presented to discover maximum frequent sub-trees from forest. MFTM uses the right-most path expansion technique. The Overlay Theorem is proposed to reduce the search space and accelerate the convergence speed. We conduct detailed experiments to test the perform- ance and scalability of the methods. The experiments demonstrate that MFTM is effective and scalable. MFTM can be applied to solve the isomorphic problems in the domains such as Web mining, HTML/XML data analysis, bioinformatics, and so on.

作者杨沛谭琦

机构地区华南理工大学计算机应用研究所华南师范大学计算机学院

出处《计算机科学》 CSCD 北大核心 2008年第2期150-153,共4页 Computer Science

基金国家自然科学基金(60003019)资助

关键词频繁子树挖掘 WEB挖掘信息抽取 Frequent tree mining, Web mining, Data extraction

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论] TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献12

1Cooley R, Mobasher B, Srivastava J. Web Mining: Information and Pattern Discovery on the World Wide Web. In: 8th IEEE Intl Conf on Tools with AI, 1997.
2Li Q, Moon B. Indexing and querying XML data for regular path expressions. In- 27th Int'1 Conf. on Very Large Data Bases, 2001.
3Shapiro B, Zhang K. Comparing multiple RNA secondary strutures using tree comparisons. Computer Applications in Biosciences, 1990,6(4) :309-318.
4Inokuehi A, Washio T, Motoda H. An apfiori-based algorithm for mining frequent substructures from graph data. In: 4th European Conference on Principles of Knowledge Discovery and Data Mining, September 2000.
5Kuramochi M,Karypis G. Frequent subgraph discovery. In: 1st IEEE Int'1 Conf. on Data Mining, November 2001.
6Cook D, Holder L. Substructure discovery using minimal description length and background knowledge. Journal of Artificial Intelligence Research, 1994,1:231-255.
7Yoshida K, Motoda H. CLIP: Concept learning from inference patterns. Artificial Intelligence, 1995,75(1):63-92.
8Asai T, Abe K, Kawasoe S, et al. Effecient substructure discovery from large semi-structured data. In: 2nd SIAM Int'1 Conference on Data Mining, April 2002.
9Zaki M J. Efficiently mining frequent trees in a forest. In: SIGKDD'2002 Edmonton, Alberta, Canada.
10杨沛,郑启伦,彭宏,李颖基.PFTM:一种基于投影的频繁子树挖掘算法[J].计算机科学,2005,32(2):206-209. 被引量：5

二级参考文献13

1Cook D, Holder L. Substructure discovery using minimal description length and background knowledge. Journal of Arti_cial Intelligence Research, 1994,1: 231～ 255.
2Yoshida K, Motoda H. CLIP: Concept learning from inference patterns. Artificial Intelligence, 1995,75 (1):63～ 92.
3Asai T,Abe K,Kawasoe S,Arimura H,Satamoto H,Arikawa S.Effecient substructure discovery from large semi-structured data.In:2nd SIAM Int'l. Conf. on Data Mining,April 2002.
4Zaki M J. Efficiently mining frequent trees in a forest. In SIGKDD'2002 Edmonton, Alberta, Canada.
5Cooley R,Mobasher B, Srivastava J. Web Mining: Information and Pattern Discovery on the World Wide Web. In: 8th IEEE Intl. Conf. on Tools with AI,1997.?A?A?A?A
6Li Q,Moon B. Indexing and querying XML data for regular path expressions. In: 27th Int'l. Conf. on Very Large Data Bases,2001.
7Shapiro B,Zhang K. Comparing multiple RNA secondary strutures using tree comparisons. Computer Applications in Biosciences,1990,6(4) :309～318.
8Inokuchi A,Washio T,Motoda H. An apriori-based algorithm for mining frequent substructures from graph data. In: 4th European Conf. on Principles of Knowledge Discovery and Data Mining,Sep. 2000.
9Kuramochi M,Karypis G. Frequent subgraph discovery. In: 1st IEEE Int'l Conf. on Data Mining,Nov. 2001.
10Agrawal R, Srikant R. Fast algorithms for mining association rules. In VLDB'94,Santiago,Chile,Sept. 1994. 487～499.

共引文献4

1孔鹏程,张继福.基于离散区间的频繁嵌入式子树挖掘算法[J].计算机应用,2009,29(4):1120-1123.
2陈冬菊,张东站,段江娇.FVTreeMiner：无序频繁子树挖掘算法[J].计算机技术与发展,2010,20(5):9-12.
3施秀升,张东站.基于递推式右路径扩展的XML频繁模式树挖掘[J].现代计算机,2011,17(3):3-6.
4尹四清,孔鹏程,张素兰.利用编码的频繁导出式子树挖掘算法[J].计算机工程与应用,2011,47(24):121-124.

同被引文献21

1朱永泰,王晨,洪铭胜,汪卫,施伯乐.ESPM——频繁子树挖掘算法[J].计算机研究与发展,2004,41(10):1720-1727. 被引量：18
2赵传申,孙志挥,张净.基于投影分支的快速频繁子树挖掘算法[J].计算机研究与发展,2006,43(3):456-462. 被引量：14
3J.Han,M.Kamber.Data Mining:Concepts and Techniques.Morgan Kaufmann Publishers,2000.
4U M Fayyad,G P ShaPiro,P Smyth and R Uthurusamy:Advances in Knowledge Discovery and Data Mining.AAAI/MIT Press,1996.
5T.Asai,K.Abe,S.Kawasoe,H.Arimura,H.Sakamoto,S.Arikawa,Effieient Substructure Discovery from Large Semistruetured Data.In Proceedings of the 2nd SIAM International Conference on Data Mining,2002,2431:57-100.
6L Zou,Y Lu,H Zhang,R Hu.Mining Frequent Induced Subtree Patterns with Subtree-Constraint.Proceedings of the 6th IEEE International Conference on Data Mining-Workshops (ICDMW2006),Hongkong,China,December,2006:3-7.
7H Tan.T S Dillon,F Hadzic,E Chang,L Feng.IMB3-Miner Mining Induced/Embedded Subtrees by Constraining the Level of Embedding.Proceedings of the Pacific-Asia Conference on Knowledge.Discovery and Data Mining,Singapore,2006:450-461.
8M Seno,G Karypis.Finding Frequent Patterns Using LengthDecreasing Support Constraints.Data Mining and Knowledge Discovery,2005,10(3):197-228.
9Y.Chi,Y.Yang,Y.Xia,R.R.Muntz,CMTreeMiner,Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees.Lecture Notes in Computer Science,2004,3056:63-73.
10M. J. Zaki. CSLOGS Data. 2003-8-6[2007-11-1] http:// www.cs.rpi.edu/~zaki/software.

引证文献4

1陈冬菊,张东站,段江娇.基于子树约束的最大频繁子树挖掘算法[J].现代计算机,2010,16(5):25-29.
2郭鑫,董坚峰,周清平.动态数据库中的频繁子树挖掘算法[J].计算机科学,2011,38(5):138-141.
3夏英,李洪旭.基于覆盖模式的频繁子树挖掘方法[J].计算机应用,2017,37(9):2439-2442. 被引量：2
4唐德权,黄金贵.基于图数据的极大频繁子树挖掘算法研究[J].微电子学与计算机,2020,37(10):54-58. 被引量：1

二级引证文献3

1郑玲玲.基于深度数据挖掘的传播数据分析与评估模型仿真[J].电子设计工程,2021,29(18):161-165. 被引量：2
2唐德权,刘绪崇.一种新的快速挖掘频繁子树算法[J].湘潭大学学报（自然科学版）,2022,44(2):96-106. 被引量：1
3唐德权,刘绪崇,姚婷婷.极大频繁模式挖掘算法[J].计算机工程与设计,2023,44(6):1758-1764.

1郭鑫,骆期裕,徐洪智.频繁子树挖掘算法综述[J].软件导刊,2009,8(12):49-51.
2郝志峰,黄灿锦,蔡瑞初,温雯,黄宇鹏,陈炳丰.结合用户兴趣的微博信息传播模式挖掘[J].模式识别与人工智能,2016,29(10):924-935. 被引量：5
3杨沛,郑启伦,彭宏,李颖基.PFTM:一种基于投影的频繁子树挖掘算法[J].计算机科学,2005,32(2):206-209. 被引量：5
4周溜溜,业宁,徐昇,严敏利.基于频繁子树挖掘的DNA重复序列识别方法[J].微电子学与计算机,2011,28(9):193-196. 被引量：2
5杨占胜,王立波.基于C#语言的类与XML转化的研究与分析[J].科技资讯,2009,7(2):11-11. 被引量：2
6赵传申,孙志挥,张净.基于投影分支的快速频繁子树挖掘算法[J].计算机研究与发展,2006,43(3):456-462. 被引量：14
7雷向欣,杨智应,黄少寅,胡运发.XML数据流分页频繁子树挖掘研究[J].计算机研究与发展,2012,49(9):1926-1936. 被引量：2
8马海兵,王兰成.高效挖掘无序频繁子树[J].小型微型计算机系统,2006,27(11):2104-2108. 被引量：6
9唐德权,夏幼明,张丽英.基于图的数据挖掘算法研究[J].云南师范大学学报（自然科学版）,2007,27(5):30-34. 被引量：5
10李娟,杨珺.基于分区的频繁子树挖掘算法研究[J].计算机工程与设计,2011,32(6):2054-2057.

计算机科学

2008年第2期

浏览历史

内容加载中请稍等...

极大频繁子树挖掘及其应用被引量：4

参考文献12

二级参考文献13

共引文献4

同被引文献21

引证文献4

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

极大频繁子树挖掘及其应用 被引量：4

参考文献12

二级参考文献13

共引文献4

同被引文献21

引证文献4

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

极大频繁子树挖掘及其应用被引量：4