Holistic twig query processing techniques based on region encoding have been developed to minimize the intermedi-ate results,namely,those root-to-leaf path matches that are not in the final twig results. These algorit...Holistic twig query processing techniques based on region encoding have been developed to minimize the intermedi-ate results,namely,those root-to-leaf path matches that are not in the final twig results. These algorithms have to scan all the streams of tags in query patterns. However,useless path matches cannot be completely avoided. TJFast which is based on the labeling scheme of Extended Dewey has been proposed to avoid useless intermedi-ate results,and it only needs to access the labels of the leaf query nodes. However,it don't concern about the characteristics of ele-ments with the same parent,and it has to merge join all the inter-mediate results which are evaluated during the first phrase. We propose a new labeling scheme to compress the XML elements which have the same characteristic. Based on the compressed path-labeled streams,a new novel holistic twig query algorithm named CPJoin is designed. Finally,implementation results are provided to show that CPJoin has good performance on both real and synthetic data.展开更多
As huge volumes of data are organized or exported in tree-structured form, it is quite necessary to extract useful information from these data collections using effective and efficient query processing methods. A natu...As huge volumes of data are organized or exported in tree-structured form, it is quite necessary to extract useful information from these data collections using effective and efficient query processing methods. A natural way of retrieving desired information from XML documents is using twig pattern (TP), which is, actually, the core component of existing XML query languages. Twig pattern possesses the inherent feature that query nodes on the same path have concrete precedence relationships. It is this feature that makes it infeasible in many actual scenarios. This has driven the requirement of relaxing the complete specification of a twig pattern to express more flexible semantic constraints in a single query expression. In this paper, we focus on query evaluation of partially specified twig pattern (PSTP) queries, through which we can reap the most flexibility of specifying partial semantic constraints in a query expression. We propose an extension to XPath through introducing two Samepath axes to support partial semantic constraints in a concise but effective way. Then we propose a stack based algorithm, pTwigStack, to process a PSTP holistically without deriving the concrete twig patterns and then processing them one by one. Further, we propose two DTD schema based optimization methods to improve the performance of pTwigStack algorithm. Our experimental results on various datasets indicate that our method performs significantly better than existing ones when processing PSTPs.展开更多
针对传统XML文档小枝模式查询算法中,与模式树中标签名相同的节点均入内存,易造成很大的空间浪费问题,提出了一种新的算法—StreamFWM(StreamFilter Without Merging)。StreamFWM采用区间编码方式,依据节点间的结构关系过滤标签流中无...针对传统XML文档小枝模式查询算法中,与模式树中标签名相同的节点均入内存,易造成很大的空间浪费问题,提出了一种新的算法—StreamFWM(StreamFilter Without Merging)。StreamFWM采用区间编码方式,依据节点间的结构关系过滤标签流中无用的中间节点,且不用归并,只用简单的栈和列表实现。实验结果证明,算法StreamFWM相比TwigStack在查询处理的性能上有所提高。展开更多
Tree Match算法是一种有效的Twig查询匹配算法,但其存在反复分析Twig模式的缺点。针对该问题,引入编译中的部分求值技术,提出一种Twig查询优化方案。通过部分求值提前完成对Twig模式的分析,生成查询专用的指令序列代替原查询程序,并给...Tree Match算法是一种有效的Twig查询匹配算法,但其存在反复分析Twig模式的缺点。针对该问题,引入编译中的部分求值技术,提出一种Twig查询优化方案。通过部分求值提前完成对Twig模式的分析,生成查询专用的指令序列代替原查询程序,并给出查询机执行引擎,从而消除重复计算,优化XML树模式查询过程。实验结果表明,在不同Twig模式下,该优化方案能够有效提高XML查询的执行效率。展开更多
Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-p...Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-partitioned encoding scheme is proposed for XML documents to capture paths of all elements, and a twig query is modeled as an XPattern extended from tree pattern. After definition, simplification, normalization, verification and initialization of the XPattern, both work sets and a join plan are generated. According to these measures, an effiective algorithm to answer for a twig query, called DMTwig, is designed without unnecessary elements and invalid structural joins. The algorithm can adaptively deal with twig queries with branch ([ ]), child edge (/), descendant edge (//), and wildcard (*) synthetically. We show that path-partitioned encoding scheme and XPattern guarantee the I/O and CPU optimality for twig queries. Experiments on representative data set indicate that the proposed solution performs significantly.展开更多
基金Supported by the National Natural Science Foundation of China (60573089)the Teaching and Research Award Program for Out-standing Young Teachers in Post-Secondary Institutions by the Ministry of Education, China (706016)
文摘Holistic twig query processing techniques based on region encoding have been developed to minimize the intermedi-ate results,namely,those root-to-leaf path matches that are not in the final twig results. These algorithms have to scan all the streams of tags in query patterns. However,useless path matches cannot be completely avoided. TJFast which is based on the labeling scheme of Extended Dewey has been proposed to avoid useless intermedi-ate results,and it only needs to access the labels of the leaf query nodes. However,it don't concern about the characteristics of ele-ments with the same parent,and it has to merge join all the inter-mediate results which are evaluated during the first phrase. We propose a new labeling scheme to compress the XML elements which have the same characteristic. Based on the compressed path-labeled streams,a new novel holistic twig query algorithm named CPJoin is designed. Finally,implementation results are provided to show that CPJoin has good performance on both real and synthetic data.
基金Supported partially by the National Natural Science Foundation of China (Grant No. 60833005)the National High-Tech Research & Development Program of China (Grant Nos. 2007AA01Z155, 2009AA011904)the National Basic Research Program of China (Grant No.2003CB317000)
文摘As huge volumes of data are organized or exported in tree-structured form, it is quite necessary to extract useful information from these data collections using effective and efficient query processing methods. A natural way of retrieving desired information from XML documents is using twig pattern (TP), which is, actually, the core component of existing XML query languages. Twig pattern possesses the inherent feature that query nodes on the same path have concrete precedence relationships. It is this feature that makes it infeasible in many actual scenarios. This has driven the requirement of relaxing the complete specification of a twig pattern to express more flexible semantic constraints in a single query expression. In this paper, we focus on query evaluation of partially specified twig pattern (PSTP) queries, through which we can reap the most flexibility of specifying partial semantic constraints in a query expression. We propose an extension to XPath through introducing two Samepath axes to support partial semantic constraints in a concise but effective way. Then we propose a stack based algorithm, pTwigStack, to process a PSTP holistically without deriving the concrete twig patterns and then processing them one by one. Further, we propose two DTD schema based optimization methods to improve the performance of pTwigStack algorithm. Our experimental results on various datasets indicate that our method performs significantly better than existing ones when processing PSTPs.
文摘针对传统XML文档小枝模式查询算法中,与模式树中标签名相同的节点均入内存,易造成很大的空间浪费问题,提出了一种新的算法—StreamFWM(StreamFilter Without Merging)。StreamFWM采用区间编码方式,依据节点间的结构关系过滤标签流中无用的中间节点,且不用归并,只用简单的栈和列表实现。实验结果证明,算法StreamFWM相比TwigStack在查询处理的性能上有所提高。
基金supported by the National High-Tech Research and Development Plan of China (Grant No.2005AA4Z3030)
文摘Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-partitioned encoding scheme is proposed for XML documents to capture paths of all elements, and a twig query is modeled as an XPattern extended from tree pattern. After definition, simplification, normalization, verification and initialization of the XPattern, both work sets and a join plan are generated. According to these measures, an effiective algorithm to answer for a twig query, called DMTwig, is designed without unnecessary elements and invalid structural joins. The algorithm can adaptively deal with twig queries with branch ([ ]), child edge (/), descendant edge (//), and wildcard (*) synthetically. We show that path-partitioned encoding scheme and XPattern guarantee the I/O and CPU optimality for twig queries. Experiments on representative data set indicate that the proposed solution performs significantly.