摘要
首先给出了XML文档树、元素外延和名字路径等的形式化定义.接着,将编码方案、路径索引和名字外延的思想相结合,提出了一种改进的XML数据的索引结构(类型索引集、名字索引集和外延索引),解决了基于传统索引技术的XML数据查询方法性能上的不足.它既可以有效地支持结构连接的计算以快速地判断任意结点之间的子孙后代关系,也可以有效地支持基于名字外延的路径连接算法以快速地判断任意结点之间的父子关系,然后还可以快速地支持对包含拥有关系的小枝查询;进而给出了基于该索引结构的外延连接算法,并着重对其处理含有父子关系和拥有关系等较复杂的XPath查询路径的不同处理过程进行了对比和分析,使得对于一条长度为n的XPath绝对路径查询,最多只需要n/2-1次外延连接,且能够根据双亲结构信息等利用外延索引尽可能跳过不需要参与连接的结点.实验结果表明,提出的新的索引结构可以有效地提高查询处理的性能.
Firstly, formal definitions of XML tree data model, element extension, name path, etc are given in context environment. Secondly, an improved index structure, including type index set, name index set and extension index, is proposed to retrieve XML data based on the idea of the numbering scheme, the path index and name extension. The index structure solves the problem of the poor performance of XML query based on conventional index techniques. It can not only quickly determine ancestor/descendant relationships by supporting the structural join algorithm, but also quickly determine parent/child relationships by supporting the path join algorithm based on name extension, and meanwhile effectively process twig query including holding relationships. Finally, an extension join algorithm based on this index structure is proposed to process XPath path query efficiently, in which comparisons and analyses among the different processing approaches for complicated XPath path expressions with parent/child and holding relationships are conducted. For an XPath absolute path query with n nodes, the extension join is needed to execute for n/2-1 times at most by avoiding scanning each node which does not participate in the join via extension index based on structure information, etc. Experimental results show that the new index structure can effectively enhance the query performance for XML data.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2008年第6期1043-1055,共13页
Journal of Computer Research and Development
基金
国家“九七三”重点基础研究发展规划基金项目(2004CB719401)~~