Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrati...Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.展开更多
Graph similarity search is a common operation of graph database,and graph editing distance constraint is the most common similarity measure to solve graph similarity search problem.However,accurate calculation of grap...Graph similarity search is a common operation of graph database,and graph editing distance constraint is the most common similarity measure to solve graph similarity search problem.However,accurate calculation of graph editing distance is proved to be NP hard,and the filter and verification framework are adopted in current method.In this paper,a dictionary tree based clustering index structure is proposed to reduce the cost of candidate graph,and is verified in the filtering stage.An efficient incremental partition algorithm was designed.By calculating the distance between query graph and candidate graph partition,the filtering effect was further enhanced.Experiments on real large graph datasets show that the performance of this algorithm is significantly better than that of the existing algorithms.展开更多
A set in Rd is called regular if its Hausdorff dimension coincides with its upper box counting dimension. It is proved that a random graph-directed self-similar set is regular a.e..
随着数据来源方式的多样化发展,多视图聚类成为研究热点。大多数算法过于专注利用图结构寻求一致表示,却忽视了如何学习图结构本身;此外,一些方法通常基于固定视图进行算法优化。为了解决这些问题,提出了一种基于相似图投影学习的多视...随着数据来源方式的多样化发展,多视图聚类成为研究热点。大多数算法过于专注利用图结构寻求一致表示,却忽视了如何学习图结构本身;此外,一些方法通常基于固定视图进行算法优化。为了解决这些问题,提出了一种基于相似图投影学习的多视图聚类算法(multi-view clustering based on similarity graph projection learning, MCSGP),通过利用投影图有效地融合了全局结构信息和局部潜在信息到一个共识图中,而不仅是追求每个视图与共识图的一致性。通过在共识图矩阵的图拉普拉斯矩阵上施加秩约束,该算法能够自然地将数据点划分到所需数量的簇中。在两个人工数据集和七个真实数据集的实验中,MCSGP算法在人工数据集上的聚类效果表现出色,同时在涉及21个指标的真实数据集中,有17个指标达到了最优水平,从而充分证明了该算法的优越性能。展开更多
知识图谱嵌入的目标是为知识图谱中的实体和关系生成低维连续的特征向量,以便计算机能够通过数学运算来挖掘知识的潜在语义,并将其应用于三元组补全、实体分类和实体解析等下游任务。翻译模型(Trans)是一种简单而有效的知识图谱嵌入方法...知识图谱嵌入的目标是为知识图谱中的实体和关系生成低维连续的特征向量,以便计算机能够通过数学运算来挖掘知识的潜在语义,并将其应用于三元组补全、实体分类和实体解析等下游任务。翻译模型(Trans)是一种简单而有效的知识图谱嵌入方法,其采用负采样的方法来提高知识图谱嵌入的准确性。然而,传统的负采样方法采用随机负采样,容易生成低质量的负三元组,从而导致实体和关系的嵌入向量训练不准确。针对这个问题,该文提出基于Canopy和K-means方法的相似实体负样本生成器(Negative Sampling of Similar Entities,NSSE),用于生成高质量的负样本。实验结果表明,使用NSSE的翻译模型相比原有模型在嵌入向量生成方面取得更好的效果。展开更多
文摘Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.
基金The Natural Science Foundation of Heilongjiang Province under Grant Nos.F2018028.
文摘Graph similarity search is a common operation of graph database,and graph editing distance constraint is the most common similarity measure to solve graph similarity search problem.However,accurate calculation of graph editing distance is proved to be NP hard,and the filter and verification framework are adopted in current method.In this paper,a dictionary tree based clustering index structure is proposed to reduce the cost of candidate graph,and is verified in the filtering stage.An efficient incremental partition algorithm was designed.By calculating the distance between query graph and candidate graph partition,the filtering effect was further enhanced.Experiments on real large graph datasets show that the performance of this algorithm is significantly better than that of the existing algorithms.
文摘A set in Rd is called regular if its Hausdorff dimension coincides with its upper box counting dimension. It is proved that a random graph-directed self-similar set is regular a.e..
文摘随着数据来源方式的多样化发展,多视图聚类成为研究热点。大多数算法过于专注利用图结构寻求一致表示,却忽视了如何学习图结构本身;此外,一些方法通常基于固定视图进行算法优化。为了解决这些问题,提出了一种基于相似图投影学习的多视图聚类算法(multi-view clustering based on similarity graph projection learning, MCSGP),通过利用投影图有效地融合了全局结构信息和局部潜在信息到一个共识图中,而不仅是追求每个视图与共识图的一致性。通过在共识图矩阵的图拉普拉斯矩阵上施加秩约束,该算法能够自然地将数据点划分到所需数量的簇中。在两个人工数据集和七个真实数据集的实验中,MCSGP算法在人工数据集上的聚类效果表现出色,同时在涉及21个指标的真实数据集中,有17个指标达到了最优水平,从而充分证明了该算法的优越性能。
文摘知识图谱嵌入的目标是为知识图谱中的实体和关系生成低维连续的特征向量,以便计算机能够通过数学运算来挖掘知识的潜在语义,并将其应用于三元组补全、实体分类和实体解析等下游任务。翻译模型(Trans)是一种简单而有效的知识图谱嵌入方法,其采用负采样的方法来提高知识图谱嵌入的准确性。然而,传统的负采样方法采用随机负采样,容易生成低质量的负三元组,从而导致实体和关系的嵌入向量训练不准确。针对这个问题,该文提出基于Canopy和K-means方法的相似实体负样本生成器(Negative Sampling of Similar Entities,NSSE),用于生成高质量的负样本。实验结果表明,使用NSSE的翻译模型相比原有模型在嵌入向量生成方面取得更好的效果。