摘要
为解决PageRank算法存在的"主题漂移"问题,本文提出一种融合VSM(向量空间模型)技术的改进方法。首先根据网页的链接结构计算PageRank值,然后建立网页的内容特征向量空间,计算主题内容相似度,最后将这两个值按一定的权重系数进行融合计算,产生新的PageRank值。经过对比实验证明,改进后的PageRank算法减少了无关网页的数量,为搜索引擎提供了更好的排序结果。
In order to solve the "Topic Drift" problem of PageRank algorithm,this paper proposes an improved method combined with VSM(vector space model) technique.First,it computes PageRank value by hyperlink structure of Web page,then builds vector space model of Web page content and computes topic content similarity.Finally it sums up new PageRank value according these two values by certain weight coefficient.Contrast experiments show that improved PageRank algorithm reduces the quantity of irrelevant Web page and provides better sorting results for search engine.
出处
《计算机与现代化》
2011年第7期96-98,101,104,共5页
Computer and Modernization