期刊文献+

Web上基于特定主题的RG-HITS算法研究 被引量:2

The Research of Resemblance Graph-HITS Algorithm Based on the Specific Topic on Web
下载PDF
导出
摘要 Web信息检索(Information Retrieval)技术研究是应用文本检索研究的成果,它结合Web图论的思想,研究Web上的信息检索,是行之有效的Web知识发现的途径。传统HITS方法所获得的信息精确度相当低,而PageRank作为一通用的搜索方法,不能够应用于特定主题的信息获取。在充分分析了PageRank、HITS等现有算法和Web文档的相似度计算方法的基础上,提出了Web上查询特定主题相关信息发现的RG-HITS算法。它结合了Web超链接、网页知识表示的信息相关度以及HITS方法来搜索Web上特定主题的相关知识。 Information Retrieval (IR) on the Web is the automatic retrieval of all relevant documents, the same as resource finding of intended Web documents, while the same time retrieves as few of the non - relevant as possible. Web IR has become very popular and favorite at present. It concentrates on the using traditional text IR methods in the Internet, as well as the properties of Web graph. This research focuses on how to effectively and broadly get relevant Web pages and contents, filter Web pages and assign proper labels for them. Accurate finding user-specific information in the Web is very difficult. And traditional Web search engines take a query as input and produce a set of (hopefully) relevant pages that match the query terms. While useful in many circumstances, search engines have the disadvantage that users have to formulate queries that specify their information need, which is prone to errors. Based on the discussion of Page Rank, HITS and similarity between Web texts, some new algorithms called RG-HITS ( Resemblance Graph-HITS) for finding relevant documents on the Web are introduced.
作者 丁一
出处 《现代图书情报技术》 CSSCI 北大核心 2005年第6期26-29,38,共5页 New Technology of Library and Information Service
关键词 知识发现 网页搜索 相似度计算 信息检索 Web mining Web search Similarity scoring Information retrieval
  • 相关文献

参考文献5

  • 1Filippo Menczer, Gautam Pant, Padmini Srinivasan, et al. Evaluating Topic- Driven Web Crawlers. 21st ACM International Conference on Research and Development in Information Retrieval . New Orleans, Lonisiana, USA. 2002:241 - 249
  • 2T. Hofmann. The cluster- abstraction model: Unsupervised learning of topic hierarchies from text data. Proceedings of 16th International Joint Conference on Artificial Intelligence ( IJCAI' 99 ). Stockholm,Sweden. 1999:682 - 687
  • 3Kleinberg M. Authoritative Sources in a Hyperlinked Eveironment. Journal of the ACM, 1999,46 ( 5 ) :604 - 632
  • 4U. Y. Nahm and R. J. Mooney. Ua mutually beneficial integration of data mining and information extraction. Proceedings of the 17th National Conference on Artificial Intelligence (AAAI′00). AAAI Press,2000:627 - 632
  • 5叶允明 马范援 于水.Igloo分布式爬虫系统的性能优化[A].李晓明 李星主编.搜索引擎与Web挖掘进展[C].北京:高等教育出版社,2003.1-8.

同被引文献9

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部