摘要
近年来,在XML查询处理方法中发表了一些基于节点流栈连接的高效的分枝连接算法。然而,这些算法普遍存在这样的问题:由于它们必须扫描查询中出现的每一个元素对应的节点流,当XML节点数量很大时,查询处理的输入代价很大,效率变得低下。为了解决这个问题,提出了一个新型的标记法记为区间路径,不同于节点流的区间标记法,区间路径可以把具有相同路径的节点集索引到一个集合中。继而提出了分枝点连接算法用于XML查询处理。同基于节点流栈的分枝连接算法相比,该算法有以下优势:节点集的祖先信息直接位于区间路径中;只有和查询结果相关的节点集会被扫描到,大大降低了输入代价;支持查询通配符;对于类型为根路径的查询,只需一次输入操作代价完成查询处理。实验结果表面该算法在输入代价,执行时间和延展性方面都优于基于节点流的分枝连接算法。
In recent years,a number of stack-based twig join algorithms have been proposed to process XQueries based on region encoded tag streams.However,these algorithms are I/O sensitive and inefficient for large documents because they need to scan and process all nodes whose tags appear in a given query.To address this problem,this paper proposes a novel labeling scheme called RegionPath.Unlike previous schemes for query processing,this scheme can group all nodes with the same path into one label.Based on the labeling,an efficient algorithm,called BranchPointJoin,is proposed to process queries.Compared with existing stack-based algorithms,our algorithm has four main advantages:one is the ancestors of the same structured nodes can be obtained from the labels directly;the second is only part of nodes associated with query are scanned and joined;the third queries with wild-cards are supported;and the forth is only one I/O cost is needed for each root path query.The experimental results on various datasets and queries show that the proposed algorithm outperforms stack based methods on I/O cost,execution time and scalability.
出处
《信息技术》
2011年第6期105-108,111,共5页
Information Technology