期刊文献+

Accuracy estimation of link-based similarity measures and its application 被引量:1

Accuracy estimation of link-based similarity measures and its application
原文传递
导出
摘要 Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR) and Sim- Rank (SR) have emerged as the most popular and influen- tial link-based similarity measures. Recently, a novel link- based similarity measure, penetrating rank (P-Rank), which enriches SR, was proposed. In practice, PPR, SR and P-Rank scores are calculated by iterative methods. As the number of iterations increases so does the overhead of the calcula- tion. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guaran- tee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing an accurate and tight upper bounds for PPR, SR, and P-Rank in the paper. Our upper bounds are designed based on the following intuition: the smaller the difference between the two consecutive iteration steps is, the smaller the difference between the theoretical and iterative similar- ity scores becomes. Furthermore, we demonstrate the effec- tiveness of our upper bounds in the scenario of top-k similar nodes queries, where our upper bounds helps accelerate the speed of the query. We also run a comprehensive set of exper- iments on real world data sets to verify the effectiveness and efficiency of our upper bounds. Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR) and Sim- Rank (SR) have emerged as the most popular and influen- tial link-based similarity measures. Recently, a novel link- based similarity measure, penetrating rank (P-Rank), which enriches SR, was proposed. In practice, PPR, SR and P-Rank scores are calculated by iterative methods. As the number of iterations increases so does the overhead of the calcula- tion. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guaran- tee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing an accurate and tight upper bounds for PPR, SR, and P-Rank in the paper. Our upper bounds are designed based on the following intuition: the smaller the difference between the two consecutive iteration steps is, the smaller the difference between the theoretical and iterative similar- ity scores becomes. Furthermore, we demonstrate the effec- tiveness of our upper bounds in the scenario of top-k similar nodes queries, where our upper bounds helps accelerate the speed of the query. We also run a comprehensive set of exper- iments on real world data sets to verify the effectiveness and efficiency of our upper bounds.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第1期113-123,共11页 中国计算机科学前沿(英文版)
关键词 personalized PageRank SIMRANK P-RANK up-per bound personalized PageRank, SimRank, P-Rank, up-per bound
  • 相关文献

参考文献23

  • 1Gupta P, Gael A, Lin J, Sharma A, Wang D, Zadeh R. WTF: the who to follow service at Twitter. In: Proceedings of International World Wide Web Conference. 2013,505-514.
  • 2Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. Journal of the Association for Information Science and Tech?nology, 2007, 58(7); 1019-1031.
  • 3Joshi A, Kumar R, Reed B, Tomkins A. Anchor-based proximity mea?sures. In: Proceedings of International World Wide Web Conference. 2007, 1131-1132.
  • 4Antonellis I, Molina H G, Chang C C. SimRank++: query rewriting through link analysis of the click graph. Proceedings of the VLDB En?dowment, 2008, 1(1): 408--421.
  • 5Jeh G, Widorn J. Scaling personalized web search. In: Proceedings of International World Wide Web Conference. 2003,271-279.
  • 6Jeh 0, Widom J. SimRank: a measure of structural-context similarity. In: Proceedings of ACM SIGKDD Conference on Knowledge Discov?ery and Data Mining. 2002,538-543.
  • 7Sarkar P, Moore A W, Prakash A. Fast iucrernental proximity search in large graphs. In: Proceedings of International Conference on Machine Learning. 2008, 896-903.
  • 8Sarkar P, Moore A W. A tractable approach to finding closest truncated?commute-time neighbors in large graphs. In: Proceedings of Uncer?tainty in Artificial Intelligence. 2007,335-343.
  • 9Zhao P, Han J, Sun Y P-Rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 553- 562.
  • 10Lizorkin D, Velikhov P, Grinev M N, Turdakov D. Accuracy estimate and optimization techniques for SimRank computation. Proceedings of the VLDB Endowment, 2008, 1(1): 422-433.

同被引文献1

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部