This paper studies the most similar maximal clique query(MSMCQ).Given a graph G and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q.MSMCQ has many real applications inclu...This paper studies the most similar maximal clique query(MSMCQ).Given a graph G and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q.MSMCQ has many real applications including advertising industry,public security,task crowdsourcing and social network,etc.MSMCQ can be studied as a special case of the general set similarity query(SSQ).However,the MCs of G has several specialties from the general sets.Based on the specialties of MCs,we propose a novel index,namely MCIndex.MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time.Specifically,we first construct an inverted indexⅠfor all the MCs of G.Since the MCs in a posting list often have a lot of overlaps,MCIndex selects some pivots to cluster the MCs with a small radius.Given a query Q,we compute the distance from the pivots to Q.The clusters of the pivots assured not answer can be pruned by our distance based pruning rule.Since it is NP-hard to construct a minimum MCIndex,we propose to construct a minimal MCIndex onⅠ(v)with an approximation ratio 1+ln|Ⅰ(v)|.Since the MCs have properties that are inherent of graph structure,we further propose a S Index within each cluster of a MCIndex and a structure based pruning rule.S Index can significantly reduce the number of candidates.Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation,we also propose a binary representation of MCs to improve the efficiency of the intersection size computation.Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets.展开更多
基金support of NSF of China(61502258,61806105)Major Technology Innovation Project of Shandong(2018CXGC0703)+2 种基金NSF of Shandong Province(ZR2014FQ007)the Project of Shandong Finance Society(2018SDJR31)Soft Science Fund of Shandong Province(2018RKB01373,2016RKB01043).
文摘This paper studies the most similar maximal clique query(MSMCQ).Given a graph G and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q.MSMCQ has many real applications including advertising industry,public security,task crowdsourcing and social network,etc.MSMCQ can be studied as a special case of the general set similarity query(SSQ).However,the MCs of G has several specialties from the general sets.Based on the specialties of MCs,we propose a novel index,namely MCIndex.MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time.Specifically,we first construct an inverted indexⅠfor all the MCs of G.Since the MCs in a posting list often have a lot of overlaps,MCIndex selects some pivots to cluster the MCs with a small radius.Given a query Q,we compute the distance from the pivots to Q.The clusters of the pivots assured not answer can be pruned by our distance based pruning rule.Since it is NP-hard to construct a minimum MCIndex,we propose to construct a minimal MCIndex onⅠ(v)with an approximation ratio 1+ln|Ⅰ(v)|.Since the MCs have properties that are inherent of graph structure,we further propose a S Index within each cluster of a MCIndex and a structure based pruning rule.S Index can significantly reduce the number of candidates.Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation,we also propose a binary representation of MCs to improve the efficiency of the intersection size computation.Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets.