基于F&B索引的XML查询处理算法被引量：2

Processing XPath over F&B-Index

下载PDF

导出

摘要 XML已成为信息交换和表示的标准.对XML数据的查询将返回满足特定约束的XML节点子集.对于大文件的XML数据的查询处理通常分为两步:1.为该XML数据建立一个索引;2.在索引上完成查询处理无需访问源文档.XML索引为查询处理提供了高效的帮助,其中F&B索引是已知的处理分枝查询最小的索引,但快速创建F&B索引和利用F&B索引完成查询处理的算法却很少有人研究.提出了一种素数序列标记法,这种标记法不仅有助于快速地建立F&B索引,更可以高效地完成F&B索引上的查询处理.此外,还给出了F&B索引上的区间标记法与CCPI的创建过程,这两种编码创建过程无需在建立F&B索引后二次创建,仅需与F&B索引创建过程一起对文档使用SAX解析器分析一次即可得到.这样,可以在F&B索引的区间标记法上使用TwigStack算法执行查询处理,在F&B索引的CCPI标记法上使用关联路径连接算法执行查询处理.还给出了基于素数序列标记法的查询处理算法,即素数整除匹配算法,该算法可以高效地判定某节点是否有某分枝子结构.实验表明基于素数序列标记法的F&B索引创建方法比SAM算法快,在多个数据集F&B索引上素数整除匹配算法优于关联路径连接算法和TwigStack算法. XML is widely used as a standard of information exchange and representation. Queries on XML can retrieve a subset of XML data nodes satisfying certain constraints. Queries on large XML data are usually processed in two steps： 1. An index of XML nodes is created; 2. Queries are processed on that index without accessing XML data. XML index provides high efficiency for XML query processing. Particularly, F＆B-index is the smallest index that supports twig query processing. However, few researches are proposed on how to efficiently create F＆B-Index and how to process queries based on F＆B-Index. Proposed in this paper is a new labeling scheme called prime sequence. This labeling scheme helps not only on creating an F＆B-Index but also on efficient query processing. With prime sequence labeling, an F＆B-index can be created by parsing the XML document only once with a SAX parser. Further, region encoding and CCPI on F＆B-Index can be created during the creation of F＆B-Index. Thus, TwigStack algorithm and related-path-join algorithm can be used to process queries on created F＆B-Index. Also proposed is an efficient algorithm named division match ,over F＆B-Index. The algorithm can efficiently judge relationship between two nodes based on a property of prime sequence labeling. Experiments show that prime sequence labeling provides high efficiency on creating F＆B-Index and high efficiency on query processing on different datasets.

作者王洪强李建中王宏志

机构地区哈尔滨工业大学计算机学院

出处《计算机研究与发展》 EI CSCD 北大核心 2010年第5期866-877,共12页 Journal of Computer Research and Development

基金国家自然科学基金项目(60473075) 黑龙江省自然科学基金项目(zjg03-05)

关键词 XML 索引 F&B索引素数序列标记法 CCPI TwigStack XML index F＆B-index prime sequence labeling CCPI TwigStack

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献20

1Tova Milo and Dan Suciu.Index structures for path expressions[G] //LNCS 1540:Proc of the 7th Int Conf on Database Theory.Berlin:Springer,1999:277-295.
2Kaushik R,Bohannon P,Naughton J F,et al.Covering indexes for branching path queries[C] //Proc of the 2002 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2002:133-144.
3Wang Wei,Wang Hongzhi,Lu Hongjun,et al.Efficient processing of XML path queries using the disk-based F&B index[C] //Proc of the 31st Int Conf on Very Large Data Bases.New York:ACM,2005:145-156.
4Liu Xianmin,Li Jianzhong,Wang Hongzhi.SAM:An efficient algorithm for F&B-index construction[G] //LNCS 4505:Proc of Joint the 9th Int Conf on Asia-Pacific Web Conference and 8th Int Conf on Web-Age Information Management.Berlin:Springer,2007:697-708.
5Bruno N,Koudas N,Srivastava D.Holistic twig joins:Optimal xml pattern matching[C] //Proc of the 2002 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2002:310-321.
6Wang Hongqiang,Li Jianzhong,Wang Hongzhi.Clustered chain path index for XML document:Efficiently processing branch queries[G] //LNCS 4255:Proc of the 7th Int Conf on Web Information Systems Engineering.Berlin:Springer,2006:474-486.
7Miklau G,Suciu D.Containment and equivalence for an xpath fragment[C] //Proc of the 21st ACM SIGACT-SIGMOD-SIGART Symp on Principles of Database Systems.New York:ACM,2002:65-76.
8Cormen T H,Leiserson C E,Rivest R L,et al.Introduction to algorithms[M].2nd ed.Boston:MIT Press,2001:790-793.
9Xmark:The xml-benchmark project[OL].[2008-12-20].http://monetdb.cwi.nl/xml.
10Dblp dataset[OL].[2008-12-20].http://dblp.uni-trier.de/xml/.

二级参考文献15

1B F Cooper, N Sample, M J Franklin, et al. A fast index for semistructured data [C]. The 27th Int'l Conf on Very Large Data Bases, Roma, Italy, 2001
2Li Quanzhong, Moon Bongki. Indexing and querying XML data for regular path expressions [C]. The 27th Int'l Conf on Very Large Data Bases, Roma, Italy, 2001
3Buneman Peter, Grohe Martin, Koch Christoph. Path queries on compressed XML [C]. The 29th Int'l Conf on Very Large Data Bases, Berlin, Germany, 2003
4M Barg, R K Wong. A fast and versatile path index for querying semi-structured data [C]. The 8th Int'l Conf on Database Systems for Advanced Applications, Kyoto, Japan,2003
5Jiang Haifeng, Lu Hongjun, Wang Wei. XR-tree: Indexing XML data for efficient structural join [C]. The 19th Int'l Conf on Data Engineering, Bangalore, India, 2003
6Kim Jongik, Kim Hyoung-Joo. Efficient processing of regular path joins using PID [J]. Information and Software Technology, 2003, 45(5): 241-251
7Zhang Chun, J F Naughton, D J DeWitt, et al. On supporting containment queries in relational database management systems[C]. The 2001 ACM SIGMOD Int'l Conf on Management of Data, Santa Barbara, California, 2001
8N Bruno, N Koudas, D Srivastava. Holistic twig joins: Optimal XML pattern matching [C]. The 2002 ACM SIGMOD Int'l Conf on Management of Data, Madison, Wisconsin, USA,2002
9Wang Haixun, Park Sanghyun, Fan Wei, et al. ViST: A dynamic index method for querying XML data by tree structures[C]. The 2003 ACM SIGMOD Int'l Conf on Management of Data, San Diego, CA, 2003
10Wang Wei, Jiang Haifeng, Lu Hongjun. PbiTree coding and efficient processing of containment Join [C]. The 19th Int'l Conf on Data Engineering, Bangalore, India, 2003

共引文献1

1周舟,刘先锋,刘萍,张楚才.完全支持数据更新的XML压缩编码[J].湖南师范大学自然科学学报,2011,34(6):21-25. 被引量：1

同被引文献19

1付林林,廖湖声,高红雨,陈荣鑫.采用流水线方式的XML整体小枝查询方案[J].计算机研究与发展,2011,48(S3):105-113. 被引量：1
2易平,胡运安,陈福生,张世永.基于PATRICIA-TRIES的XML路径索引设计[J].小型微型计算机系统,2006,27(3):474-480. 被引量：2
3Jiadong Ren Xiaopeng Yin Xiaodan Guo.A Dynamic Labeling Scheme for XML Document[J].通讯和计算机（中英文版）,2006,3(5):61-65. 被引量：5
4Chen Z,Gehrke J,Korn F,et al.Index structures for matching XMLtwigs using relational query processors[J].Data&Knowledge Engineering,2007,60(2):283-302.
5Mohammad S,Martin P,Powley W.Relational universal index structure for evaluating XML twig queries[C]//Proceedings of Communications and Information Technology(ICCIT).Aqaba:IEEE,2011:116-120.
6Yun J H,Chung C W.Efficient probabilistic XML query processing using an extended labeling scheme and a lightweight index[J].Information Processing&Management,2012,48(6):1181-1202.
7Cheng R,Xia Y,Prabhakar S,et al.Efficient indexing methods for probabilistic threshold queries over uncertain data[C]//Proceedings of the Thirtieth international conference on Very large data bases-Volume30.Hong Kong:VLDB Endowment,2004:876-887.
8Tao Y,Cheng R,Xiao X,et al.Indexing multi-dimensional uncertain data with arbitrary probability density functions[C]//Proceedings of the 31st international conference on Very large data bases.Hong Kong:VLDB Endowment,2005:922-933.
9Abiteboul S,Chan T H,Kharlamov E.Aggregate queries for discrete and continuous probabilistic XML[C]//Proceedings of the 13th International Conference on Database Theory.Lausanne:ACM Press,2010:50-61.
10Ning B,Liu C,Yu J X,et al.Matching top-k answers of twig patterns in probabilistic XML[J].Lecture Notes in Computer Science,2010,5981:125-139.

引证文献2

1雷梁,李刚,谢美英.支持数据动态更新的XML编码研究[J].信阳师范学院学报（自然科学版）,2015,28(2):283-287.
2张晓琳,郭丹丹,韩雨童,郝琨,谭跃生.RLPI索引:一种处理连续不确定XML索引[J].计算机应用与软件,2016,33(4):25-29.

1王洪强,李建中,王宏志.基于素数序列标记法的XML查询处理算法[J].哈尔滨商业大学学报（自然科学版）,2009,25(5):540-545.
2张青平,蒋夏军,皮德常.一种新的基于流过滤的Twig模式匹配算法[J].计算机与数字工程,2014,42(5):751-756.
3金洪西.SAX解析器的实现算法和具体程序分析[J].黑龙江科技信息,2010(15):75-75.
4任海珍.组播协议在以太网中的应用研究[J].数码世界（A）,2006,5(11A):24-25.
5曹品军.数字水印技术在电子文档管理中的应用[J].电脑知识与技术（过刊）,2015,21(1X):156-157. 被引量：2
6一江春水.手工编号快速变更为自动编号[J].电脑迷,2008,0(22):71-71.
7ZHOU Junfeng,XIE Min,MENG Xiaofeng.TwigStack^+:Holistic Twig Join Pruning Using Extended Solution Extension[J].Wuhan University Journal of Natural Sciences,2007,12(5):855-860. 被引量：1
8赵逸智,何志学,张云峰.基于模式的XML Twig查询处理方法[J].北华航天工业学院学报,2016,26(5):16-19.
9姚美玲,刘贵全,陈恩红.一种改进的XML枝匹配算法[J].中国科学技术大学学报,2010,40(11):1180-1185.
10汪万根.XML数据流中的后兄弟查询算法[J].计算机工程,2009,35(8):107-109.

计算机研究与发展

2010年第5期

浏览历史

内容加载中请稍等...

基于F&B索引的XML查询处理算法被引量：2

参考文献20

二级参考文献15

共引文献1

同被引文献19

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于F&B索引的XML查询处理算法 被引量：2

参考文献20

二级参考文献15

共引文献1

同被引文献19

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于F&B索引的XML查询处理算法被引量：2