期刊文献+

一种基于超链接结构的向量空间模型改进算法

Refinement of the Vector Space Model Based on Documents Hyperlink Structures
下载PDF
导出
摘要 在基于向量空间模型的信息检索系统中,TF-IDF算法被广泛的应用在基于关键字的信息检索中。然而,对于网页独特的超链接结构,需要有一种技术在表示网页内容的同时将与它相邻链接的网页内容考虑进去。本文分析了向量空间模型的实质,并找出了其精度低的原因,在传统模型基础上提出了一种基于网页超链接结构的向量空间模型改进算法。实验分析表明改进后的算法与原算法相比检索精确度提高了10%,在一定程度上改善了检索效果。 In information retrieval systems based on the vector space model, the TF-IDF scheme is widely used to characterize documents. However, in the case of documents with hyperlink structures such as Web pages, it is necessary to develop a technique for representing the contents of Web pages more accurately by using the contents of their hyperlink neighboring pages. VSM is analyzed to find the reason for its low precision, and propose an approach by using the contents of hyperlink neighboring pages. The experiment results show that the algorithm is effective. The precision rate promotes 10%.
出处 《中文信息学报》 CSCD 北大核心 2005年第4期68-71,77,共5页 Journal of Chinese Information Processing
关键词 计算机应用 中文信息处理 搜索引擎 信息检索 向量空间模型 超链接 computer application Chinese information processing search engine information retrieval vector space model hyperlink
  • 相关文献

参考文献6

二级参考文献6

  • 1Robert E Filman,Sangam Pant.Searching the Intemet[J].IEEE Intemet Computing, 1998;2(4) :59-69.
  • 2Eric W Brown,James P Callan,W Bruce Croft.Fast Incremental Indexing for Full-Text Information Retrieval[C].In:Proceedings of the 20th VLDB Conference Santiago,Chile, 1994.
  • 3Clifford A Lynch.Networked Information Resource Discovery:An Overview of Current Issues[J].IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1995; 13 (8) : 1505-1522.
  • 4Sahon G,McGill M J.Introduction to modern Information Retrieval[M]. New York : McGraw-Hill Book company, 1983.
  • 5Ian H Witten.How to build Digital Library[M].Morgan Kaufmann Publishers, 2003.
  • 6李凡,鲁明羽,陆玉昌.关于文本特征抽取新方法的研究[J].清华大学学报(自然科学版),2001,41(7):98-101. 被引量:78

共引文献171

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部