基于投影编码的频繁子树挖掘算法
被引量:2
An Algorithm of Mining Frequent Subtrees Based on Projection and Encoding
摘要
频繁子树挖掘被广泛地应用于Web挖掘、生物信息学、XML数据挖掘等领域.提出一种新的算法--PETreeMiner.算法利用序列中无候选产生的技术--前缀投影技术来挖掘频繁子树.在树的先序遍历序列中加入结点的范围属性,在投影过程中进行编码,使得挖掘到的频繁子序列直接对应成一棵频繁子树.实验结果表明算法优于其他算法.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2006年第z3期389-394,共6页
Journal of Computer Research and Development
基金
燕山大学博士基金项目(B83)
参考文献10
-
1[1]M J Zaki.Efficiently mining frequent trees in a forest.The 8th Int'l Conf on Knowledge Discovery and Data Mining (SIGKDD),Edmonton,Canada,2002
-
2[2]M J Zaki.Efficiently mining frequent embedded unordered trees.Fundamental Informaticae,2005,66(1-2):33-52
-
3[3]T Asai,K Abe,S Kawasoe,et al.Efficient substructure discovery from large semi-structured data.The 2nd SIAM Int'l Conf on Data Mining,Arlington,USA,2002
-
4[4]T Asai,H Arimura,T Uno,et al.Discovering frequent substructures in large unordered trees.The 6th Int'l Conf on Discovery Science,Sapporo,Japan,2003
-
5[5]J Han,等.数据挖掘:概念与技术.北京:机械工业出版社,2001
-
6[6]J Han,J Pei.FreeSpan:Frequent pattern-projected sequential mining.The 6th Int'l Conf on Knowledge Discovery and Data Mining(SIGKDD),Boston,USA,2000
-
7[7]J Pei,J Han.PrefixSpan:Mining sequential patterns by prefix projected growth.The 17th Int'l Conf on Data Engineering,Heidelberg,Germany,2001
-
8朱永泰,王晨,洪铭胜,汪卫,施伯乐.ESPM——频繁子树挖掘算法[J].计算机研究与发展,2004,41(10):1720-1727. 被引量:18
-
9[10]Y Chi.Frequent Subtree mining--An overview.Fundamental.Informaticae,2005,66(1-2):161-198
-
10[11]Christie I Ezeife,Yi Lu.Mining Web log sequential patterns with position coded pre-order linked WAP-tree.Data Mining and Knowledge Discovery,2005,10(1):5-38
二级参考文献20
-
1R Agarwal, et al. A tree projection algorithm for generation of frequent item sets. Journal of Parallel and Distributed Computing,2001, 61(3): 350~371
-
2R Agrawal, et al. Fast algorithms for mining association rules in large databases. The 20th Int'l Conf on Very Large Data Bases,Santiago de Chile, hile, 1994
-
3J Han, J Pei, et al. Mining frequent patterns without candidate generation. The ACM-SIGMOD Int'l Conf on Management of Data, Dallas, Texas, USA, 2000
-
4R Agrawal, et al. Mining sequential pattem. The 1 1th Int' l Conf on Data Engineering, Taipei, Taiwan, 1995
-
5J Ayres, et al. Sequential pattern mining using a bitmap representation. The 8th ACM SIGKDD Int 'l Conf on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002
-
6J Pei, et al. PreffixSpan: Mining sequential patterns by preffixprojected growth. The 17th Int'l Conf on Data Engineering,Heidelberg, Germany, 2001
-
7M Zaki. SPADE: An effcient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1/2): 31~60
-
8T Asai, K Abe, et al. Efficient substructure discovery from large semi-structured data. The 2nd SIAM Int'l Conf on Data Mining,Arlington, VA, USA, 2002
-
9M Kuramochi, et al. Frequent subgraph discovery. The IEEE Int'l Conf on Data Mining, San Jose, California, USA, 2001
-
10M J Zaki. Efficiently mining frequent trees in a forest. The 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002
共引文献17
-
1胡枫.频繁序列模式挖掘算法Apriori的分析及改进[J].青海师范大学学报(自然科学版),2009,25(3):35-38. 被引量:1
-
2赵文文,吴坚,陈波.数据挖掘中的频繁模式发现[J].萍乡高等专科学校学报,2005,22(4):84-85.
-
3赵传申,孙志挥,张净.基于投影分支的快速频繁子树挖掘算法[J].计算机研究与发展,2006,43(3):456-462. 被引量:14
-
4国新出版物发行数据调查中心修改《出版物发行数据核查指引》(报刊部分)[J].中国报业,2006(12):17-17.
-
5朱颖雯,吉根林.一种高效的最大频繁Embedded子树挖掘算法[J].计算机科学,2007,34(12):175-179. 被引量:1
-
6王涛.一种基于频繁子树的数据库索引方法[J].华中科技大学学报(自然科学版),2008,36(3):103-106. 被引量:1
-
7周军,姜元春,林文龙.基于有向带权图的Web用户浏览行为模型[J].情报理论与实践,2008,31(5):795-798. 被引量:1
-
8孔鹏程,张继福.基于离散区间的频繁嵌入式子树挖掘算法[J].计算机应用,2009,29(4):1120-1123.
-
9贝毅君,陈刚,董金祥.面向Web活跃用户的树型访问模式挖掘算法[J].浙江大学学报(工学版),2009,43(6):1005-1013.
-
10郭鑫,李云,黄云,周清平.最小闭树特征集的聚类与分类方法[J].计算机应用,2010,30(2):423-426. 被引量:5
同被引文献15
-
1朱永泰,王晨,洪铭胜,汪卫,施伯乐.ESPM——频繁子树挖掘算法[J].计算机研究与发展,2004,41(10):1720-1727. 被引量:18
-
2杨沛,郑启伦,彭宏,李颖基.PFTM:一种基于投影的频繁子树挖掘算法[J].计算机科学,2005,32(2):206-209. 被引量:5
-
3赵传申,孙志挥,张净.基于投影分支的快速频繁子树挖掘算法[J].计算机研究与发展,2006,43(3):456-462. 被引量:14
-
4马海兵,王兰成.高效挖掘无序频繁子树[J].小型微型计算机系统,2006,27(11):2104-2108. 被引量:6
-
5Pei Jian, Hart Jiawei, Mortazavi-Asl B, et al.PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth[C]// Proceedings of ICDE, 2001 : 215-224.
-
6Inokuchi A, Washin T, Motoda H.An apriori-based algorithm for mining frequent substructures from graph data[C]//Proceedings of the 2000 Europe Conference on Principle of Data Mining and Knowledge Discovery (PKDD' 00), 2000.
-
7Srivastava J, Cooley R.Web usage mining:discovery and applications of usage patterns from Web data[J].ACMSIGKDD Explora- tions Newsletter,2000,1 (2) : 12-23.
-
8Shasha D,Wang J T L,Zhang Sen.Unordered tree mining with applications to phylogeny[C]//Proceedings of ICDE,2004:708-719.
-
9Zaki T M J.Efficiently mining frequent trees in a forest[C]//Pro- ceedings of the 8th ACM SIGKDD on Knowledge Discovery and Data Mining,2002:71-80.
-
10Asai T, Abe K, Kawasoe S, et al.Efficient substructure discovery from large semistructured data[J].IEICE Transactions on Information and Systems,2004,87 (12) : 2754-2763.
-
1师鸣若.一种网络流量的序列模式挖掘方法[J].微计算机信息,2011,27(3):230-232.
-
2袁园.基于多层次技术的XML数据挖掘研究[J].信息通信,2016,29(1):143-144. 被引量:1
-
3朱兴统,许波.一种基于粗糙集理论的XML数据挖掘模型[J].科学技术与工程,2011,11(20):4898-4902.
-
4秦兆文,刘嘉勇.基于PrefixSpan的应用层协议特征串提取算法[J].信息安全与通信保密,2014,12(6):105-108. 被引量:1
-
5张巍,刘峰,滕少华.改进的PrefixSpan算法及其在序列模式挖掘中的应用[J].广东工业大学学报,2013,30(4):49-54. 被引量:11
-
6郭鑫,骆期裕,徐洪智.频繁子树挖掘算法综述[J].软件导刊,2009,8(12):49-51.
-
7李彬,何静,张岩.管理信息系统的数据库设计[J].光盘技术,2008(1):24-26. 被引量:5
-
8万洪莉.SOAP消息的非递归先序解析算法研究[J].软件工程师,2009(11):52-53.
-
9方少卿,胡学钢.基于Web挖掘的信息抽取系统的研究[J].铜陵学院学报,2010,9(4):66-68.
-
10刘骞,陈明.基于Map/Reduce集群上的模式空间划分的序列模式挖掘[J].微电子学与计算机,2012,29(9):149-151. 被引量:1