期刊文献+

基于F&B索引的XML查询处理算法 被引量:2

Processing XPath over F&B-Index
下载PDF
导出
摘要 XML已成为信息交换和表示的标准.对XML数据的查询将返回满足特定约束的XML节点子集.对于大文件的XML数据的查询处理通常分为两步:1.为该XML数据建立一个索引;2.在索引上完成查询处理无需访问源文档.XML索引为查询处理提供了高效的帮助,其中F&B索引是已知的处理分枝查询最小的索引,但快速创建F&B索引和利用F&B索引完成查询处理的算法却很少有人研究.提出了一种素数序列标记法,这种标记法不仅有助于快速地建立F&B索引,更可以高效地完成F&B索引上的查询处理.此外,还给出了F&B索引上的区间标记法与CCPI的创建过程,这两种编码创建过程无需在建立F&B索引后二次创建,仅需与F&B索引创建过程一起对文档使用SAX解析器分析一次即可得到.这样,可以在F&B索引的区间标记法上使用TwigStack算法执行查询处理,在F&B索引的CCPI标记法上使用关联路径连接算法执行查询处理.还给出了基于素数序列标记法的查询处理算法,即素数整除匹配算法,该算法可以高效地判定某节点是否有某分枝子结构.实验表明基于素数序列标记法的F&B索引创建方法比SAM算法快,在多个数据集F&B索引上素数整除匹配算法优于关联路径连接算法和TwigStack算法. XML is widely used as a standard of information exchange and representation. Queries on XML can retrieve a subset of XML data nodes satisfying certain constraints. Queries on large XML data are usually processed in two steps: 1. An index of XML nodes is created; 2. Queries are processed on that index without accessing XML data. XML index provides high efficiency for XML query processing. Particularly, F&B-index is the smallest index that supports twig query processing. However, few researches are proposed on how to efficiently create F&B-Index and how to process queries based on F&B-Index. Proposed in this paper is a new labeling scheme called prime sequence. This labeling scheme helps not only on creating an F&B-Index but also on efficient query processing. With prime sequence labeling, an F&B-index can be created by parsing the XML document only once with a SAX parser. Further, region encoding and CCPI on F&B-Index can be created during the creation of F&B-Index. Thus, TwigStack algorithm and related-path-join algorithm can be used to process queries on created F&B-Index. Also proposed is an efficient algorithm named division match ,over F&B-Index. The algorithm can efficiently judge relationship between two nodes based on a property of prime sequence labeling. Experiments show that prime sequence labeling provides high efficiency on creating F&B-Index and high efficiency on query processing on different datasets.
出处 《计算机研究与发展》 EI CSCD 北大核心 2010年第5期866-877,共12页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60473075) 黑龙江省自然科学基金项目(zjg03-05)
关键词 XML 索引 F&B索引 素数序列标记法 CCPI TwigStack XML index F&B-index prime sequence labeling CCPI TwigStack
  • 相关文献

参考文献20

  • 1Tova Milo and Dan Suciu.Index structures for path expressions[G] //LNCS 1540:Proc of the 7th Int Conf on Database Theory.Berlin:Springer,1999:277-295.
  • 2Kaushik R,Bohannon P,Naughton J F,et al.Covering indexes for branching path queries[C] //Proc of the 2002 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2002:133-144.
  • 3Wang Wei,Wang Hongzhi,Lu Hongjun,et al.Efficient processing of XML path queries using the disk-based F&B index[C] //Proc of the 31st Int Conf on Very Large Data Bases.New York:ACM,2005:145-156.
  • 4Liu Xianmin,Li Jianzhong,Wang Hongzhi.SAM:An efficient algorithm for F&B-index construction[G] //LNCS 4505:Proc of Joint the 9th Int Conf on Asia-Pacific Web Conference and 8th Int Conf on Web-Age Information Management.Berlin:Springer,2007:697-708.
  • 5Bruno N,Koudas N,Srivastava D.Holistic twig joins:Optimal xml pattern matching[C] //Proc of the 2002 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2002:310-321.
  • 6Wang Hongqiang,Li Jianzhong,Wang Hongzhi.Clustered chain path index for XML document:Efficiently processing branch queries[G] //LNCS 4255:Proc of the 7th Int Conf on Web Information Systems Engineering.Berlin:Springer,2006:474-486.
  • 7Miklau G,Suciu D.Containment and equivalence for an xpath fragment[C] //Proc of the 21st ACM SIGACT-SIGMOD-SIGART Symp on Principles of Database Systems.New York:ACM,2002:65-76.
  • 8Cormen T H,Leiserson C E,Rivest R L,et al.Introduction to algorithms[M].2nd ed.Boston:MIT Press,2001:790-793.
  • 9Xmark:The xml-benchmark project[OL].[2008-12-20].http://monetdb.cwi.nl/xml.
  • 10Dblp dataset[OL].[2008-12-20].http://dblp.uni-trier.de/xml/.

二级参考文献15

  • 1B F Cooper, N Sample, M J Franklin, et al. A fast index for semistructured data [C]. The 27th Int'l Conf on Very Large Data Bases, Roma, Italy, 2001
  • 2Li Quanzhong, Moon Bongki. Indexing and querying XML data for regular path expressions [C]. The 27th Int'l Conf on Very Large Data Bases, Roma, Italy, 2001
  • 3Buneman Peter, Grohe Martin, Koch Christoph. Path queries on compressed XML [C]. The 29th Int'l Conf on Very Large Data Bases, Berlin, Germany, 2003
  • 4M Barg, R K Wong. A fast and versatile path index for querying semi-structured data [C]. The 8th Int'l Conf on Database Systems for Advanced Applications, Kyoto, Japan,2003
  • 5Jiang Haifeng, Lu Hongjun, Wang Wei. XR-tree: Indexing XML data for efficient structural join [C]. The 19th Int'l Conf on Data Engineering, Bangalore, India, 2003
  • 6Kim Jongik, Kim Hyoung-Joo. Efficient processing of regular path joins using PID [J]. Information and Software Technology, 2003, 45(5): 241-251
  • 7Zhang Chun, J F Naughton, D J DeWitt, et al. On supporting containment queries in relational database management systems[C]. The 2001 ACM SIGMOD Int'l Conf on Management of Data, Santa Barbara, California, 2001
  • 8N Bruno, N Koudas, D Srivastava. Holistic twig joins: Optimal XML pattern matching [C]. The 2002 ACM SIGMOD Int'l Conf on Management of Data, Madison, Wisconsin, USA,2002
  • 9Wang Haixun, Park Sanghyun, Fan Wei, et al. ViST: A dynamic index method for querying XML data by tree structures[C]. The 2003 ACM SIGMOD Int'l Conf on Management of Data, San Diego, CA, 2003
  • 10Wang Wei, Jiang Haifeng, Lu Hongjun. PbiTree coding and efficient processing of containment Join [C]. The 19th Int'l Conf on Data Engineering, Bangalore, India, 2003

共引文献1

同被引文献19

  • 1付林林,廖湖声,高红雨,陈荣鑫.采用流水线方式的XML整体小枝查询方案[J].计算机研究与发展,2011,48(S3):105-113. 被引量:1
  • 2易平,胡运安,陈福生,张世永.基于PATRICIA-TRIES的XML路径索引设计[J].小型微型计算机系统,2006,27(3):474-480. 被引量:2
  • 3Jiadong Ren Xiaopeng Yin Xiaodan Guo.A Dynamic Labeling Scheme for XML Document[J].通讯和计算机(中英文版),2006,3(5):61-65. 被引量:5
  • 4Chen Z,Gehrke J,Korn F,et al.Index structures for matching XMLtwigs using relational query processors[J].Data&Knowledge Engineering,2007,60(2):283-302.
  • 5Mohammad S,Martin P,Powley W.Relational universal index structure for evaluating XML twig queries[C]//Proceedings of Communications and Information Technology(ICCIT).Aqaba:IEEE,2011:116-120.
  • 6Yun J H,Chung C W.Efficient probabilistic XML query processing using an extended labeling scheme and a lightweight index[J].Information Processing&Management,2012,48(6):1181-1202.
  • 7Cheng R,Xia Y,Prabhakar S,et al.Efficient indexing methods for probabilistic threshold queries over uncertain data[C]//Proceedings of the Thirtieth international conference on Very large data bases-Volume30.Hong Kong:VLDB Endowment,2004:876-887.
  • 8Tao Y,Cheng R,Xiao X,et al.Indexing multi-dimensional uncertain data with arbitrary probability density functions[C]//Proceedings of the 31st international conference on Very large data bases.Hong Kong:VLDB Endowment,2005:922-933.
  • 9Abiteboul S,Chan T H,Kharlamov E.Aggregate queries for discrete and continuous probabilistic XML[C]//Proceedings of the 13th International Conference on Database Theory.Lausanne:ACM Press,2010:50-61.
  • 10Ning B,Liu C,Yu J X,et al.Matching top-k answers of twig patterns in probabilistic XML[J].Lecture Notes in Computer Science,2010,5981:125-139.

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部