Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrati...Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.展开更多
Graph similarity search is a common operation of graph database,and graph editing distance constraint is the most common similarity measure to solve graph similarity search problem.However,accurate calculation of grap...Graph similarity search is a common operation of graph database,and graph editing distance constraint is the most common similarity measure to solve graph similarity search problem.However,accurate calculation of graph editing distance is proved to be NP hard,and the filter and verification framework are adopted in current method.In this paper,a dictionary tree based clustering index structure is proposed to reduce the cost of candidate graph,and is verified in the filtering stage.An efficient incremental partition algorithm was designed.By calculating the distance between query graph and candidate graph partition,the filtering effect was further enhanced.Experiments on real large graph datasets show that the performance of this algorithm is significantly better than that of the existing algorithms.展开更多
With the development of the social media and Internet, discovering latent information from massive information is becoming particularly relevant to improving user experience. Research efforts based on preferences and ...With the development of the social media and Internet, discovering latent information from massive information is becoming particularly relevant to improving user experience. Research efforts based on preferences and relationships between users have attracted more and more attention. Predictive problems, such as inferring friend relationship and co-author relationship between users have been explored. However, many such methods are based on analyzing either node features or the network structures separately, few have tried to tackle both of them at the same time. In this paper, in order to discover latent co-interests' relationship, we not only consider users' attributes but network information as well. In addition, we propose an Interest-based Factor Graph Model (I-FGM) to incorporate these factors. Experiments on two data sets (bookmarking and music network) demonstrate that this predictive method can achieve better results than the other three methods (ANN, NB, and SVM).展开更多
文摘Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.
基金The Natural Science Foundation of Heilongjiang Province under Grant Nos.F2018028.
文摘Graph similarity search is a common operation of graph database,and graph editing distance constraint is the most common similarity measure to solve graph similarity search problem.However,accurate calculation of graph editing distance is proved to be NP hard,and the filter and verification framework are adopted in current method.In this paper,a dictionary tree based clustering index structure is proposed to reduce the cost of candidate graph,and is verified in the filtering stage.An efficient incremental partition algorithm was designed.By calculating the distance between query graph and candidate graph partition,the filtering effect was further enhanced.Experiments on real large graph datasets show that the performance of this algorithm is significantly better than that of the existing algorithms.
基金the National Natural Science Foundation of China (No. 61170192)the Natural Science Foundations of Municipality of Chongqing(No. CSTC2012JJB40012)
文摘With the development of the social media and Internet, discovering latent information from massive information is becoming particularly relevant to improving user experience. Research efforts based on preferences and relationships between users have attracted more and more attention. Predictive problems, such as inferring friend relationship and co-author relationship between users have been explored. However, many such methods are based on analyzing either node features or the network structures separately, few have tried to tackle both of them at the same time. In this paper, in order to discover latent co-interests' relationship, we not only consider users' attributes but network information as well. In addition, we propose an Interest-based Factor Graph Model (I-FGM) to incorporate these factors. Experiments on two data sets (bookmarking and music network) demonstrate that this predictive method can achieve better results than the other three methods (ANN, NB, and SVM).