期刊文献+

基于链接关系的Web页面相似度搜索 被引量:4

LINK RELATION-BASED WEB PAGES SIMILARITY SEARCH
下载PDF
导出
摘要 Web页面相似度搜索对于网络新闻推荐、近似查询等研究领域具有重要作用。SimRank是经典的相似度计算模型,但其预计算时间和空间开销非常巨大,不适用大规模Web页面网络。利用SimRank快速收敛的特点,在SimRank基础上提出高效Web页面相似度搜索方法(WSR),预计算1步迭代相似度矩阵,根据预计算的1步迭代相似度矩阵在线计算给定查询页面和其他页面的2步迭代相似度。通过对Web网络进行静态剪枝,进一步提高预计算和在线查询处理的效率。实验结果显示,WSR显著降低了存储开销和预计算时间开销,且具有较高精确度和快速查询响应时间。 Web pages similarity search plays important role in many research fields such as Web news recommendation and approximate query, etc. SimRank is a classical similarity computation model, however, it is not adaptable to large Webpage networks because its space and time cost is very high. Utilising the characteristic of SimRank in fast convergence, we propose an efficient Web pages similarity search (WSR) method. It pre-computes 1-hop iterative similarity matrix, and then conducts online computation of 2-hop iterative similarities of the given querying pages and other pages according to the computed 1-hop iterative similarity matrix. The pre-computation and online query processing efficiencies are further improved by static pruning on Web network. Experimental result shows that the WSR evidently reduces the storage cost and pre-computation time cost, and has higher accuracy and fast query responding time.
出处 《计算机应用与软件》 CSCD 北大核心 2014年第1期57-61,共5页 Computer Applications and Software
基金 山西省自然科学基金项目(2012011014-2)
关键词 Web页面网络 相似度搜索 SIMRANK Web page network Similarity search SimRank
  • 相关文献

参考文献16

  • 1Jeh G,Widom J. SimRank: A Measure of Structural-contextSimilarity [ C ]// Prec. of SIGKDD,2002.
  • 2Jeh G, Widom J. Scaling personalized web search [ C ]//Prec. of WWW, 2002.
  • 3Small H G. Co-citation in the scientific literature: A new measure of the relationship between two documents [ J ]. Journal of the American Society for Information Science, 1973, 24(4) : 265 -269.
  • 4Kessler M M. Bibliographic coupling between scientific papers [ J ]. A- merican Documentation, 1963, 14 : 10 - 25.
  • 5Popescul A, Flake G, Lawrence S, et al. Clustering and identifying tem- poral trends in document databases[ C ]//Prec. of the IEEE Advances in Digital Libraries, 2000.
  • 6Small H. Co-citation in the scientific literature : A new measure of the relationship between two documents [ J ]. Journal of the American Soci- ety for Information Science, 1973, 4:265 - 269.
  • 7Larson R R. Bibliometrics of the World-Wide Web: An exploratory a- nalysis of the intellectual structure of cyberspace [ C ]//Prec. of the Annual Meeting of the American Society for Information Science, Balti- more, Maryland, October 1996.
  • 8Pitkow J, Pirolli P. Life, death, and lawfulness on the electronic frontier [C]//Proc. of the Conference on Human Factors in Computing Sys- tems,Atlanta, Georgia, 1997.
  • 9Lin Z, King I, Lyu M. R. Pagesim: A novel link-based similarity meas- ure for the world wide web [ C ]//Prec. of WI, 2006:687 - 693.
  • 10Fogaras D, Racz B. Scaling link-based similarity search [ C ]//Prec. of WWW, 2005:641-650.

二级参考文献10

  • 1李凡,林爱武,陈国社.一种基于VSM文本分类系统的设计与实现[J].华中科技大学学报(自然科学版),2005,33(3):53-55. 被引量:19
  • 2Joe M Kleinberg. Authoriative sources in a hyperlinked environment[J]. Journal of the ACM, 1999,46 ( 5 ) :604 - 632.
  • 3Etzioni O. The world wide Web : Quagmire or gold mine [J]. Communizations of ACM, 1996,39 ( 11 ) :65-68.
  • 4Gordian S. Livoff, Michael J A. Berry. Mining the Weh:Transforming Customer Data into Customer Value[M].沈钧毅,宋擒豹,燕彩蓉,等译.北京:电子工业出版社,2004.
  • 5WebLa:Web Linkage Analysis 2005.http://Webla.sourceforge.net/.
  • 6李华虎.基于语义的Web数据挖掘在在线阅读网站应用的研究[D].东华大学,2008.
  • 7Kleinberg J. Authoritative sources in a hyperlinked environnterll [C]// Proc. 9^th ACM-SIAM Syrr-posium on Discrete Alogorithms. 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ,10026.1997.
  • 8战学刚,林鸿飞,姚天顺.Infolite中文检索系统[J].小型微型计算机系统,2000,21(9):989-992. 被引量:9
  • 9韩家炜,孟小峰,王静,李盛恩.Web挖掘研究[J].计算机研究与发展,2001,38(4):405-414. 被引量:356
  • 10朱炜,王超,李俊,潘金贵.Web超链分析算法研究[J].计算机科学,2003,30(9):89-93. 被引量:20

共引文献4

同被引文献50

  • 1杨焱,孙铁利,邱春艳.个性化推荐技术的研究[J].信息工程大学学报,2005,6(2):84-87. 被引量:23
  • 2ZHAO D, STROTMANN A. Intellectual structure of stem cell research : a comprehensive author co-citation analysis of a high- ly collaborative and multidisciplinary field [ J ]. Scientomet- rics, 2011, 87 (1): 115-131.
  • 3PAN Y, LID H, LIU J G, et al. Detecting community struc- ture in complex networks via node similarity [ J ]. Physica A: Statistical Mechanics and its Applications, 2010, 389 ( 14 ) : 2849-2857.
  • 4HUANG Y, CONTRACTOR N, YAO Y. CI-KNOW: recom- mendation based on social networks [ C]//Proceedings of the 2008 International Conference on Digital Government Re- search. Digital Government Society of North America, 2008: 27 -33.
  • 5HSIAO C H, YANG C. The intellectual development of the technology acceptance model: a co-citation analysis [ J ]. In- ternational Journal of Information Management, 2011, 31 (2) : 128-136.
  • 6ZHAO D, STROMANN A. Evolution of research activities and intellectual influences in information science 1996 - 2005 : in- troducing author bibliographic-coupling analysis [ J ]. Journal of the American Society for Information Science and Technolo- gy, 2008, 59 (13): 2070-2086.
  • 7LEYDESDORFF L. Why words and co-words cannot map the development of the Sciences [ J ]. Journal of the American so- ciety for Information Science, 1997, 48 (5) : 418-427.
  • 8JEH G, WIDOM J. SimRank: a measure of structural-context similarity [ C]//Proceedings of the Eighth ACM SIGKDD In- ternational Conference on Knowledge Discovery and Data Min- ing. ACM, 2002: 538-543.
  • 9FOGARAS D, R~CZ B. Scaling link-based similarity search [C ]//Proceedings of the 14th International Conference on World Wide Web. ACM, 2005: 641-650.
  • 10LI C, HAN J, HE G, et al. Fast computation of simrank for static and dynamic information networks [ C ]//Proceedings of the 13th International Conference on Extending Database Tech- nology. ACM, 2010: 465-476.

引证文献4

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部