摘要
利用传统的搜索引擎寻找信息,返回的页面结果集查准率低且信息冗余,基于Web结构挖掘技术的HITS算法可以提高页面搜索的有效性。在深入分析HITS算法及其相关改进算法的基础上,提出一种基于相似度值的向量空间投影HITS算法。该算法在超链接结构分析的基础上结合页面文本内容,能较好地消除HITS算法存在的主题偏移现象,且不增加额外的系统开销。
There usually have several problems, such as low accuracy and data redundancy, in the result set given by the traditional search engine. HITS algorithm based on Web structure mining technology can markedly improve the effectiveness of searching Web pages. Deeply analyzes HITS algorithm and some pertinent improved algorithms, and proposes a vector space projection HITS algorithm. This algorithm, which is based on the analysis of hyperlink structure and combining page text content, can relatively eliminate the theme deviation issue in HITS algorithm without causing extra system overhead.
出处
《现代计算机》
2009年第10期20-22,37,共4页
Modern Computer
基金
重庆市科委自然科学基金项目(CSTC
No.2007BB2439)
重庆市教委基金项目(No.0634167)