Holistic twig query processing techniques based on region encoding have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. These algori...Holistic twig query processing techniques based on region encoding have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. These algorithms have to scan all the streams of tags in query patterns. However, useless path matches cannot be completely avoided. TJFast which is based on the labeling scheme of Extended Dewey has been proposed to avoid useless intermediate results, and it only needs to access the labels of the leaf query nodes. However, it don't concern about the characteristics of elements with the same parent, and it has to merge join all the intermediate results which are evaluated during the first phrase. We propose a new labeling scheme to compress the XML elements which have the same characteristic. Based on the compressed path-labeled streams, a new novel holistic twig query algorithm named CPJoin is designed. Finally, implementation results are provided to show that CPJoin has good performance on both real and synthetic data.展开更多
基金Supported by the National Natural Science Foundation of China (60573089)the Teaching and Research Award Program for Out-standing Young Teachers in Post-Secondary Institutions by the Ministry of Education, China (706016)
文摘Holistic twig query processing techniques based on region encoding have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. These algorithms have to scan all the streams of tags in query patterns. However, useless path matches cannot be completely avoided. TJFast which is based on the labeling scheme of Extended Dewey has been proposed to avoid useless intermediate results, and it only needs to access the labels of the leaf query nodes. However, it don't concern about the characteristics of elements with the same parent, and it has to merge join all the intermediate results which are evaluated during the first phrase. We propose a new labeling scheme to compress the XML elements which have the same characteristic. Based on the compressed path-labeled streams, a new novel holistic twig query algorithm named CPJoin is designed. Finally, implementation results are provided to show that CPJoin has good performance on both real and synthetic data.