一种基于频繁子树的数据库索引方法

A frequent subtree-based indexing method

下载PDF

导出

摘要为解决带标号的有根无序树的数据库的索引问题,提出一种新的索引方法,首先挖掘频繁子树,并从中挑选出有判别力的子树作为索引属性,然后将索引属性集合中的子树转换成序列,并将索引组织成前缀树的形式.给出了在此类索引树中进行搜索的算法,并用Apriori剪枝和最大的有判别力的子树来减小搜索空间.实验结果表明:与其他基于路径的索引方法相比,这种基于频繁子树的数据库索引在索引大小和查询代价两方面都有较好的优越性. A new indexing method is proposed to solve the problem of indexing labeled rooted unordered trees. In this method, all frequent suhtrees are generated and discriminative suhtrees are selected among them as indexing features; suhtrees in the feature set into sequences are translated and held in a prefix tree. An algorithm of searching in the index tree is also proposed and there are two optimal techniques： apriori pruning and maximum discriminative subtrees, to reduce the search space. Experi- mental results show that our frequent suhtree-hased indexing method performs better and consumes less space than other path-based indexing methods.

作者王涛

机构地区湖北经济学院计算机科学与技术学院

出处《华中科技大学学报（自然科学版）》 EI CAS CSCD 北大核心 2008年第3期103-106,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)

关键词数据挖掘频繁子树数据库索引子树搜索索引树 data mining frequent subtree database indexing suhtree search index tree

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Yan X, Yu P S, Han J. Graph indexing: a frequent structure based approach[C]//Proc 2004 ACM-SIG-MOD Int Conf on Management of Data (SIGMOD' 04). New York: ACM Press, 2004: 253-264.
2Cheng H, Yan X, Han J. Seqlndex: indexing sequences by sequential pattern analysis[C]//Proc 2005 SIAM Int Conf on Data Mining (SDM'05). London: Springer, 2005: 316-320.
3Wang C, Hong M, Pei J, et al. Efficient patterngrowth methods for frequent tree pattern mining[C] //The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD' 04 ). Los Alamitos: IEEE; Computer Society, 2004:245-252.
4Nijssen S, Kok J N. Efficient discovery of frequent unordered trees[C] // Proceedings of the First International Workshop on Mining Graphs, Trees and Sequences ( MGTS-2003 ). San Jose: AAAI Press, 2003:55-64.
5Chi Y, Yang Y, Muntz R R. HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms [C] // The 16th International Conference on Scientific and Statistical Database Management (SSDBM' 04). San Francisco : Morgan Kaufmann, 2004:450-465.
6Chalmers R, Almeroth K. On the topology of multicast trees: technical report[R]. New York: University of California, Santa Barbara, 2002.
7Shasha D, Wang J T-L, Giugno R. Algorithms and applications of tree and graph searching [C]//Proc 21th ACM Symp Principles of Dalabase Systems (PODS'02). New York: ACM Press, 2002:39-52.
8朱永泰,王晨,洪铭胜,汪卫,施伯乐.ESPM——频繁子树挖掘算法[J].计算机研究与发展,2004,41(10):1720-1727. 被引量：18
9赵传申,孙志挥,张净.基于投影分支的快速频繁子树挖掘算法[J].计算机研究与发展,2006,43(3):456-462. 被引量：14
10汪卫,周皓峰,袁晴晴,楼宇波,施伯乐.基于图论的频繁模式挖掘[J].计算机研究与发展,2005,42(2):230-235. 被引量：17

二级参考文献42

1朱永泰,王晨,洪铭胜,汪卫,施伯乐.ESPM——频繁子树挖掘算法[J].计算机研究与发展,2004,41(10):1720-1727. 被引量：18
2Rakesh Agrawal, Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. VLDB1994, Santiago,Chile, 1994.
3Heikki Mannila, et al. Search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery,1997, 1(3): 241～258.
4Jong Soo Park, et al. An effective Hash based algorithm for mining association rules. SIGMOD1995, San Jose, USA, 1995.
5Sergey Brin, et al. Dynamic itemset counting and implication rules for market basket data. SIGMOD1997, Tucson, USA,1997.
6Ramesh C. Agarwal, et al. Depth first generation of long patterns, KDD 2000, Boston, USA, 2000.
7Ramesh C. Agarwal, et al. A tree projection algorithm for generation of frequent itemsets. J. of Parallel and Distributed Computing, 2001, 61(3): 350～371.
8Jiawei Han, Jian Pei, Yiwen Yin. Mining frequent patterns without candidate generation. SIGMOD2000, Dallas, USA, 2000.
9J. Pei, et al.. H-Mine: Hyper-structure mining of frequent patterns in large databases. ICDM'01, San Jose, CA, 2001.
10Mike Perkowitz, Oren Etzioni. Adaptive sites: Automatically learning from user access patterns. WWW' 97, Santa Clara, 1997.

共引文献36

1陈子军,李伟,李霞,王鑫昱.基于投影编码的频繁子树挖掘算法[J].计算机研究与发展,2006,43(z3):389-394. 被引量：2
2胡枫.频繁序列模式挖掘算法Apriori的分析及改进[J].青海师范大学学报（自然科学版）,2009,25(3):35-38. 被引量：1
3赵文文,吴坚,陈波.数据挖掘中的频繁模式发现[J].萍乡高等专科学校学报,2005,22(4):84-85.
4赵传申,孙志挥,张净.基于投影分支的快速频繁子树挖掘算法[J].计算机研究与发展,2006,43(3):456-462. 被引量：14
5詹宇斌,殷建平,张玲,龙军,程杰仁.一种基于有向树挖掘Web日志中最大频繁访问模式的方法[J].计算机应用,2006,26(7):1662-1665. 被引量：9
6陈亮,高建民,李青,陈琨.基于频繁活动序列挖掘的过程改进机会分析[J].西安交通大学学报,2006,40(11):1310-1314. 被引量：1
7国新出版物发行数据调查中心修改《出版物发行数据核查指引》(报刊部分)[J].中国报业,2006(12):17-17.
8刘勇,李建中,朱敬华.一种新的基于频繁闭显露模式的图分类方法[J].计算机研究与发展,2007,44(7):1169-1176. 被引量：10
9朱颖雯,吉根林.一种高效的最大频繁Embedded子树挖掘算法[J].计算机科学,2007,34(12):175-179. 被引量：1
10吴卫江,李国和.一种基于极大连通子图的电信社群网分割算法[J].计算机工程与应用,2008,44(5):8-9. 被引量：2

1郭鑫,董坚峰,周清平.动态数据库中的频繁子树挖掘算法[J].计算机科学,2011,38(5):138-141.
2朱铁稳,黄菊香,唐波,景宁.数字图书馆与多媒体数据库的检索技术[J].计算机工程与应用,2002,38(9):190-193. 被引量：13
3付东来,李元.Web多媒体数据实时索引的设计与实现[J].计算机与数字工程,2006,34(3):16-19.
4阎岭,蒋静坪.基于进化策略的K-means聚类算法[J].江南大学学报（自然科学版）,2004,3(3):245-248. 被引量：3
5马月坤,刘鹏飞,张振友,孙燕,丁铁凡.改进的FP-Growth算法及其分布式并行实现[J].哈尔滨理工大学学报,2016,21(2):20-27. 被引量：13
6杨涛,杨泽年,陈俊良.浅析P2P环境下的XML索引问题[J].科海故事博览：科教创新,2011(1):20-20.
7廖志坚.基于关系的知识库的演绎查询[J].华南师范大学学报（自然科学版）,1996,28(2):38-42. 被引量：1
8蔡捷飞,袁华.一种基于关键维的图像索引方法[J].广西师范大学学报（自然科学版）,2009,27(1):209-212. 被引量：1
9路延.网络社区检测的一种新型混合算法[J].电子测试,2014,25(8):32-34.
10杨悦.基于海量卫星测控数据存储与查询方法[J].科学技术与工程,2013,21(25):7352-7356. 被引量：3

华中科技大学学报（自然科学版）

2008年第3期

浏览历史

内容加载中请稍等...

一种基于频繁子树的数据库索引方法

参考文献10

二级参考文献42

共引文献36

相关作者

相关机构

相关主题

浏览历史