Graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. To accelerate the simila...Graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop-tree based indexing method. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.展开更多
对基于R-Tree的空间连接代价模型进行了探讨,主要研究了HUANG Y W提出的空间连接代价模型。利用最优/最差选择策略降低该算法的时间复杂度,对基于缓冲区的代价模型提出了改进后的评估公式,通过实验验证了改进后的模型比原模型提高了评...对基于R-Tree的空间连接代价模型进行了探讨,主要研究了HUANG Y W提出的空间连接代价模型。利用最优/最差选择策略降低该算法的时间复杂度,对基于缓冲区的代价模型提出了改进后的评估公式,通过实验验证了改进后的模型比原模型提高了评估的精确度。展开更多
文摘Graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop-tree based indexing method. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.