期刊文献+

改进的Web链接主题提取算法 被引量:1

Improved web linkages topic distillation algorithm
下载PDF
导出
摘要 HITS算法是影响相当广泛的链接分析算法。但是,深入的研究表明,它很容易产生主题漂移。而HITS算法产生主题漂移的很大一部分原因在于页面被投影到错误的潜在语义基上。提出一种基于权值调整的超链主题提取算法(weightedadjustments based hyperlinks topic distillation),先在获得根集的过程中,用改进的权值进行相似度计算,得到相对更为准确的个性化根集,再利用HITS算法计算Web页面的权威值和中心值。实验结果表明,基于权值调整的超链主题提取算法可以很好地改善HITS算法所导致的主题漂移问题,更适合于Web查询的需要。 HITS (hypertext-induced topic search) algorithm is one of the most important algorithms for linkage analysis, however a disadvantage of it is topic drift. The problem of topic drift due to the web pages projecting to wrong latent semantic basis is found. A new WAHTD (weighted adjustments based hyperlinks topic distillation) algorithm is presented, which constructs personalized root set and base set using weighted adjustments and then computes authority and hub value of web pages by HITS to distill topic. The experimental results show that WAHTD perform better than HITS in topic distillation quality and improve the topic drift problem, so it is more appropriate to Web query.
出处 《计算机工程与设计》 CSCD 北大核心 2007年第2期294-296,共3页 Computer Engineering and Design
关键词 链接分析 主题提取 向量空间模型 权值调整 资源发现 link analysis topic distillation VSM weighted adjustments resource discovery
  • 相关文献

参考文献5

二级参考文献49

  • 1刘艳青,田萱,苏桂莲.基于Internet的个性化信息检索技术的研究[J].计算机工程与设计,2004,25(5):772-775. 被引量:12
  • 2吴军,王作英,禹锋,王侠.汉语语料的自动分类[J].中文信息学报,1995,9(4):25-32. 被引量:24
  • 3[1]Marti Hearst, Next Generation Web Search: Setting our sites [J]. IEEE Data Engineering Bulletin, 2000,23(3): 38~48.
  • 4[2]Jon M. Kleinberg, Authoritative sources in a hyperlinked environment[J]. Journal of the ACM, 1999,46(5): 604~632.
  • 5[3]S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine[J]. Computer Networks and ISDN Systems,1998,30(1-7): 107~117.
  • 6[4]Jeffrey Dean, Monika R. Henzinger, Finding related pages in the World Wide Web [J]. Computer Networks, 1999, 31 (11-16):1467~1479.
  • 7[5]S. Lawrence and C.L. Giles, Searching the World Wide Web [J]. Science, 1998,280(4): 98~100.
  • 8[6]Krishna Bharat, Monika R. Henzinger, Improved algorithms for topic distillation in hyperlinked environments [C]. Proc. of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1998, 104~111.
  • 9[7]Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk,Evaluating strategies for similarity search on the web[C]. WWW 2002, May 2002,432~442.
  • 10[8]Holger Billhardt, Daniel Borrajo, Victor Maojo. A context vector Model for information retrieval[J]. Journal of the American Society for Information Science and Technology, 2002,53(3): 236~249.

共引文献49

同被引文献28

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部