摘要
SimRank是一种衡量有向图中任意两节点间结构相似度的模型,其主要思想为,若图中两个节点被相似节点引用,则这两个节点相似.SimRank计算的相似度被广泛应用到网络图聚类、近似查询和协同过滤等领域.SimRank计算模型是一个递归模型,其计算时间、空间复杂度非常高,很难应用于大规模图计算.过去十几年,研究者们针对大规模图提出了许多高效或近似计算的SimRank计算算法.本文首先介绍SimRank模型的描述,以及常见的SimRank计算问题定义,然后按照计算方式将这些算法分为迭代法、非迭代法与随机游走法三类;将非迭代法分为基于矩阵运算求解、基于节点对图求解以及基于线性表示求解,将随机游走法分为基于不同索引结构求解、基于不同抽样方式求解以及其他随机游走算法;介绍了这些算法的基本概念、计算原理以及算法特点;分析了随机游走法与迭代法、非迭代法之间的关系;对各种算法的时间复杂度、空间复杂度、计算精确度以及可扩展性进行了论述;在此基础总结了这些SimRank算法所对应的计算场景,主要包括单点对/单源(Single Pair/Single Source)查询问题、全体/部分节点对(All Pair/Partial Pair)计算问题以及查询问题.最后对不同算法实验中图的规模进行了总结,并对大规模图上的SimRank计算方法进行了总结和展望.
SimRank is a model for measuring the similarity of two vertices in a directed graph.The main idea is that,if two vertices in the same graph are referenced by similar vertices,the two vertices are similar.SimRank scores are widely used in graph clustering,approximate query and collaborative filtering.As SimRank model is a recursive model,its computational time and space complexity of SimRank is very high.So it is difficult to apply the model to large-scale graphs.Over the past decades,researchers have proposed many efficient or approximate computational SimRank calculation algorithms for large-scale graphs.In this paper,we first introduce the definition of SimRank,and the definitions of common SimRank calculation problems.Then we introduce these algorithms and divide these algorithms into three categories:iterative algorithms,non-iterative algorithms and random walk algorithms.Furthermore,we divide the non-iterative algorithms into matrix operation based algorithms,on node-pair graph based algorithms and linear representation based algorithms,and divide the random walk algorithms into index based algorithms,sampling methods based algorithms and other random walk algorithms.Meanwhile we introduce basic concepts,calculation principles and algorithm property of these algorithms.The analysis of the relationship between the random walk algorithm and the other two is also conducted.We summarize the time complexity,space complexity and scalability of these algorithms.And we summarize the scenarios where these algorithms are applied,which mainly are Single Pair/Single Source,All Pair/Partial Pair and SimRank Join Query problems.At last,the scale of graphs in experiments of different algorithms is summarized,and the calculation methods of SimRank on large-scale graphs are summarized and forecasted.
作者
张良富
李翠平
陈红
ZHANG Liang-Fu;LI Cui-Ping;CHEN Hong(School of Information,Renmin University of China,Beijing 100872;Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education(Renmin University of China),Beijing 100872)
出处
《计算机学报》
EI
CSCD
北大核心
2019年第12期2665-2682,共18页
Chinese Journal of Computers
基金
国家重点研发计划(2018YFB1004401)
国家自然科学基金(61772537,61772536,61702522,61532021)资助~~