With the development of information technology, the amount of power grid topology data has gradually increased. Therefore, accurate querying of this data has become particularly important. Several researchers have cho...With the development of information technology, the amount of power grid topology data has gradually increased. Therefore, accurate querying of this data has become particularly important. Several researchers have chosen different indexing methods in the filtering stage to obtain more optimized query results because currently there is no uniform and efficient indexing mechanism that achieves good query results. In the traditional algorithm, the hash table for index storage is prone to "collision" problems, which decrease the index construction efficiency. Aiming at the problem of quick index entry, based on the construction of frequent subgraph indexes, a method of serialized storage optimization based on multiple hash tables is proposed. This method mainly uses the exploration sequence to make the keywords evenly distributed; it avoids conflicts of the stored procedure and performs a quick search of the index. The proposed algorithm mainly adopts the "filterverify" mechanism; in the filtering stage, the index is first established offline, and then the frequent subgraphs are found using the "contains logic" rule to obtain the candidate set. Experimental results show that this method can reduce the time and scale of candidate set generation and improve query efficiency.展开更多
The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer s...The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.展开更多
频繁模式挖掘(FPM)是图数据研究领域的一个经典问题,单一大图上的FPM问题近年来受到了更加广泛的关注。该问题被定义为根据用户给定的频率阈值查找在大图(Graph)中频繁出现的所有模式图(Pattern)。近年来,人们见证了FPM在多个领域的广...频繁模式挖掘(FPM)是图数据研究领域的一个经典问题,单一大图上的FPM问题近年来受到了更加广泛的关注。该问题被定义为根据用户给定的频率阈值查找在大图(Graph)中频繁出现的所有模式图(Pattern)。近年来,人们见证了FPM在多个领域的广泛应用,例如社交网络分析、欺诈检测等。然而,面对新兴的应用需求,人们需要更具语义表达力的模式图及其挖掘技术。为此,在传统模式图的基础上,首先提出了量化模式图(Quantified Graph Patterns,QGPs)——一类具有计数量词约束的模式图,实现了模式图语义的扩展;其次设计了一种在分布式场景下挖掘QGPs的算法,提出了量化图模式关联规则(Quantified Graph Pattern Association Rules,QGPARs)及其挖掘技术,用于预测(社交)网络中实体之间的潜在联系,然后利用真实图和合成图数据,通过翔实的实验验证了QGPs挖掘算法的计算效率,通过与经典链接预测方法进行对比,发现QGPARs可以取得更高的链接预测准确性;最后通过与传统图模式关联规则(Graph Pattern Association Rules,GPARs)的链接预测结果进行对比,验证了QGPARs与GPARs之间在链接预测结果方面存在显著差异,也进一步验证了QGPARs在链接预测中的有效性。展开更多
结合基于图的关联规则挖掘和双向搜索的策略,产生最大频繁项集,从而提出基于图的最大频繁项集(graph based maximum frequen tset,GBMFS)生成算法。运用此算法,结合社会网络的动态特征,发现社会网络中所存在的团伙的核心成员。最后,在...结合基于图的关联规则挖掘和双向搜索的策略,产生最大频繁项集,从而提出基于图的最大频繁项集(graph based maximum frequen tset,GBMFS)生成算法。运用此算法,结合社会网络的动态特征,发现社会网络中所存在的团伙的核心成员。最后,在实际系统中对相关的算法进行了验证。展开更多
很多频繁子图挖掘算法已被提出.然而,这些算法产生的频繁子图数量太多而不能被用户有效地利用.为此,提出了一个新的研究问题:挖掘图数据库中的频繁跳跃模式.挖掘频繁跳跃模式既可以大幅度地减少输出模式的数量,又能使有意义的图模式保...很多频繁子图挖掘算法已被提出.然而,这些算法产生的频繁子图数量太多而不能被用户有效地利用.为此,提出了一个新的研究问题:挖掘图数据库中的频繁跳跃模式.挖掘频繁跳跃模式既可以大幅度地减少输出模式的数量,又能使有意义的图模式保留在挖掘结果中.此外,跳跃模式还具有抗噪声干扰能力强等优点.然而,由于跳跃模式不具有反单调性质,挖掘它们非常具有挑战性.通过研究跳跃模式自身的特性,提出了两种新的裁剪技术:基于内扩展的裁剪和基于外扩展的裁剪.在此基础上又给出了一种高效的挖掘算法GraphJP(an algorithm for mining jump patterns from graph databases).另外,还严格证明了裁剪技术和算法GraphJP的正确性.实验结果表明,所提出的裁剪技术能够有效地裁剪图模式搜索空间,算法GraphJP是高效、可扩展的.展开更多
基金supported by the State Grid Science and Technology Project (Title: Research on High Performance Analysis Technology of Power Grid GIS Topology Based on Graph Database, 5455HJ160005)
文摘With the development of information technology, the amount of power grid topology data has gradually increased. Therefore, accurate querying of this data has become particularly important. Several researchers have chosen different indexing methods in the filtering stage to obtain more optimized query results because currently there is no uniform and efficient indexing mechanism that achieves good query results. In the traditional algorithm, the hash table for index storage is prone to "collision" problems, which decrease the index construction efficiency. Aiming at the problem of quick index entry, based on the construction of frequent subgraph indexes, a method of serialized storage optimization based on multiple hash tables is proposed. This method mainly uses the exploration sequence to make the keywords evenly distributed; it avoids conflicts of the stored procedure and performs a quick search of the index. The proposed algorithm mainly adopts the "filterverify" mechanism; in the filtering stage, the index is first established offline, and then the frequent subgraphs are found using the "contains logic" rule to obtain the candidate set. Experimental results show that this method can reduce the time and scale of candidate set generation and improve query efficiency.
文摘The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.
文摘频繁模式挖掘(FPM)是图数据研究领域的一个经典问题,单一大图上的FPM问题近年来受到了更加广泛的关注。该问题被定义为根据用户给定的频率阈值查找在大图(Graph)中频繁出现的所有模式图(Pattern)。近年来,人们见证了FPM在多个领域的广泛应用,例如社交网络分析、欺诈检测等。然而,面对新兴的应用需求,人们需要更具语义表达力的模式图及其挖掘技术。为此,在传统模式图的基础上,首先提出了量化模式图(Quantified Graph Patterns,QGPs)——一类具有计数量词约束的模式图,实现了模式图语义的扩展;其次设计了一种在分布式场景下挖掘QGPs的算法,提出了量化图模式关联规则(Quantified Graph Pattern Association Rules,QGPARs)及其挖掘技术,用于预测(社交)网络中实体之间的潜在联系,然后利用真实图和合成图数据,通过翔实的实验验证了QGPs挖掘算法的计算效率,通过与经典链接预测方法进行对比,发现QGPARs可以取得更高的链接预测准确性;最后通过与传统图模式关联规则(Graph Pattern Association Rules,GPARs)的链接预测结果进行对比,验证了QGPARs与GPARs之间在链接预测结果方面存在显著差异,也进一步验证了QGPARs在链接预测中的有效性。
文摘很多频繁子图挖掘算法已被提出.然而,这些算法产生的频繁子图数量太多而不能被用户有效地利用.为此,提出了一个新的研究问题:挖掘图数据库中的频繁跳跃模式.挖掘频繁跳跃模式既可以大幅度地减少输出模式的数量,又能使有意义的图模式保留在挖掘结果中.此外,跳跃模式还具有抗噪声干扰能力强等优点.然而,由于跳跃模式不具有反单调性质,挖掘它们非常具有挑战性.通过研究跳跃模式自身的特性,提出了两种新的裁剪技术:基于内扩展的裁剪和基于外扩展的裁剪.在此基础上又给出了一种高效的挖掘算法GraphJP(an algorithm for mining jump patterns from graph databases).另外,还严格证明了裁剪技术和算法GraphJP的正确性.实验结果表明,所提出的裁剪技术能够有效地裁剪图模式搜索空间,算法GraphJP是高效、可扩展的.