Graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. To accelerate the simila...Graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop-tree based indexing method. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.展开更多
通过在U-tree中添加时间戳和速度矢量等时空因素,提出一种基于U-tree的高效率当前及未来不确定位置信息检索的索引结构TPU-tree,可以支持多维空间中不确定移动对象的索引,并提出了一种改进的基于p-bound的MP_BBRQ(modifiedp-bound based...通过在U-tree中添加时间戳和速度矢量等时空因素,提出一种基于U-tree的高效率当前及未来不确定位置信息检索的索引结构TPU-tree,可以支持多维空间中不确定移动对象的索引,并提出了一种改进的基于p-bound的MP_BBRQ(modifiedp-bound based range query)域查询处理算法,能够引入搜索区域进行预裁剪以减少查询精炼阶段所需代价偏高的积分计算.实验仿真表明,采用MP_BBRQ算法的TPU-tree概率查询性能极大地优于传统的TPR-tree索引,且更新性能与传统索引大致相当,具有良好的实用价值.展开更多
Recent studies have addressed that the cache be havior is important in the design of main memory index structures. Cache-conscious indices such as the CSB^+-tree are shown to outperform conventional main memory indic...Recent studies have addressed that the cache be havior is important in the design of main memory index structures. Cache-conscious indices such as the CSB^+-tree are shown to outperform conventional main memory indices such as the AVL-tree and the T-tree. This paper proposes a cacheconscious version of the T-tree, CST-tree, defined according to the cache-conscious definition. To separate the keys within a node into two parts, the CST-tree can gain higher cache hit ratio.展开更多
As the speed gap between main memory and modern processors continues to widen, the cache behavior becomes more important for main memory database systems (MMDBs). Indexing technique is a key component of MMDBs. Unfo...As the speed gap between main memory and modern processors continues to widen, the cache behavior becomes more important for main memory database systems (MMDBs). Indexing technique is a key component of MMDBs. Unfortunately, the predominant indexes -B^+-trees and T-trees -- have been shown to utilize cache poorly, which triggers the development of many cache-conscious indexes, such as CSB^+-trees and pB^+-trees. Most of these cache-conscious indexes are variants of conventional B^+-trees, and have better cache performance than B^+-trees. In this paper, we develop a novel J^+-tree index, inspired by the Judy structure which is an associative array data structure, and propose a more cacheoptimized index -- Prefetching J^+-tree (pJ^+-tree), which applies prefetching to J^+-tree to accelerate range scan operations. The J^+-tree stores all the keys in its leaf nodes and keeps the reference values of leaf nodes in a Judy structure, which makes J^+-tree not only hold the advantages of Judy (such as fast single value search) but also outperform it in other aspects. For example, J^+-trees can achieve better performance on range queries than Judy. The pJ^+-tree index exploits prefetching techniques to further improve the cache behavior of J^+-trees and yields a speedup of 2.0 on range scans. Compared with B^+-trees, CSB^+-trees, pB^+-trees and T-trees, our extensive experimental Study shows that pJ^+-trees can provide better performance on both time (search, scan, update) and space aspects.展开更多
文摘Graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop-tree based indexing method. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.
文摘通过在U-tree中添加时间戳和速度矢量等时空因素,提出一种基于U-tree的高效率当前及未来不确定位置信息检索的索引结构TPU-tree,可以支持多维空间中不确定移动对象的索引,并提出了一种改进的基于p-bound的MP_BBRQ(modifiedp-bound based range query)域查询处理算法,能够引入搜索区域进行预裁剪以减少查询精炼阶段所需代价偏高的积分计算.实验仿真表明,采用MP_BBRQ算法的TPU-tree概率查询性能极大地优于传统的TPR-tree索引,且更新性能与传统索引大致相当,具有良好的实用价值.
基金Supported bythe National High Technology of 863Project (2002AA1Z2308 ,2002AA118030)
文摘Recent studies have addressed that the cache be havior is important in the design of main memory index structures. Cache-conscious indices such as the CSB^+-tree are shown to outperform conventional main memory indices such as the AVL-tree and the T-tree. This paper proposes a cacheconscious version of the T-tree, CST-tree, defined according to the cache-conscious definition. To separate the keys within a node into two parts, the CST-tree can gain higher cache hit ratio.
基金supported by a grant from HP Lab China,and the National Natural Science Foundation of China under Grant Nos.60496325 and 60573092
文摘As the speed gap between main memory and modern processors continues to widen, the cache behavior becomes more important for main memory database systems (MMDBs). Indexing technique is a key component of MMDBs. Unfortunately, the predominant indexes -B^+-trees and T-trees -- have been shown to utilize cache poorly, which triggers the development of many cache-conscious indexes, such as CSB^+-trees and pB^+-trees. Most of these cache-conscious indexes are variants of conventional B^+-trees, and have better cache performance than B^+-trees. In this paper, we develop a novel J^+-tree index, inspired by the Judy structure which is an associative array data structure, and propose a more cacheoptimized index -- Prefetching J^+-tree (pJ^+-tree), which applies prefetching to J^+-tree to accelerate range scan operations. The J^+-tree stores all the keys in its leaf nodes and keeps the reference values of leaf nodes in a Judy structure, which makes J^+-tree not only hold the advantages of Judy (such as fast single value search) but also outperform it in other aspects. For example, J^+-trees can achieve better performance on range queries than Judy. The pJ^+-tree index exploits prefetching techniques to further improve the cache behavior of J^+-trees and yields a speedup of 2.0 on range scans. Compared with B^+-trees, CSB^+-trees, pB^+-trees and T-trees, our extensive experimental Study shows that pJ^+-trees can provide better performance on both time (search, scan, update) and space aspects.