摘要
信息网络无处不在.通过把网络中的对象抽象为点,把对象之间的关系刻画为边,相应的信息网络就可以用图来表示.图中结点相似度计算是图数据管理中的基本问题,在很多领域都有运用,比如社会网络分析、信息检索和推荐系统等.其中,著名的相似度度量是以Personalized Page Rank和Sim Rank为代表.这两种度量本质都是以图中的路径来定义,然而它们侧重的路径截然不同.为此,提出了一个度量Super Sim Rank.它不仅涵盖了这些路径,而且考虑了Personalized Page Rank和Sim Rank两者都没有考虑的路径,从而能够更加体现出这种链接关系的本质.在此基础上对Super Sim Rank进行了理论分析,从而提出了相应的优化算法,使得计算性能从最坏情况O(kn4)提高到O(knl).这里,k是迭代次数,n是结点数,l是边数.最后,通过实验验证了Super Sim Rank优于Sim Rank和Personalized Page Rank,同时验证了优化算法在各种情况下都是有效的.
Information networks are ubiquitous. These networks can be modeled by graphs where nodes refer to objects and edges are relationships between objects in the networks. Measuring nodes similarity in a graph is a fundamental problem of graph data management. There are many real-world applications based on similarity between objects, such as networks analyses, information retrieval and recommendation systems. A number of link-based similarity measures have been proposed. Both SimRank and personalized PageRank are the most popular and influential similarity measures. The two measures differ from each other because they depend on different paths. This paper proposes a similarity measure that takes advantages of both SimRank and personalized PageRank. The new measure strengthens the principle of SimRank: "Two objects are similar if they are referenced by similar objects". The paper also analyzes the new similarity measure in theory and proposes an optimization algorithm which reduces the time cost from O(kn^4) to O(knl) where k is the number of iteration, n is the number of nodes and l is the number of edges. Experimental results demonstrate the effectiveness of the new similarity measure and efficiency of the optimization algorithm.
出处
《软件学报》
EI
CSCD
北大核心
2014年第11期2602-2615,共14页
Journal of Software
基金
国家重点基础研究发展计划(973)(2014CB340402
2012CB316205)
国家自然科学基金(61272137
61033010
61202114)
国家社会科学基金(12&ZD220)
国家高技术研究发展计划(863)(2014AA015200)
国家高等学校学科创新引智计划