摘要
在基于向量空间模型的信息检索系统中,TF-IDF算法被广泛的应用在基于关键字的信息检索中。然而,对于网页独特的超链接结构,需要有一种技术在表示网页内容的同时将与它相邻链接的网页内容考虑进去。本文分析了向量空间模型的实质,并找出了其精度低的原因,在传统模型基础上提出了一种基于网页超链接结构的向量空间模型改进算法。实验分析表明改进后的算法与原算法相比检索精确度提高了10%,在一定程度上改善了检索效果。
In information retrieval systems based on the vector space model, the TF-IDF scheme is widely used to characterize documents. However, in the case of documents with hyperlink structures such as Web pages, it is necessary to develop a technique for representing the contents of Web pages more accurately by using the contents of their hyperlink neighboring pages. VSM is analyzed to find the reason for its low precision, and propose an approach by using the contents of hyperlink neighboring pages. The experiment results show that the algorithm is effective. The precision rate promotes 10%.
出处
《中文信息学报》
CSCD
北大核心
2005年第4期68-71,77,共5页
Journal of Chinese Information Processing
关键词
计算机应用
中文信息处理
搜索引擎
信息检索
向量空间模型
超链接
computer application
Chinese information processing
search engine
information retrieval
vector space model
hyperlink