摘要
HITS算法是影响相当广泛的链接分析算法。但是,深入的研究表明,它很容易产生主题漂移。而HITS算法产生主题漂移的很大一部分原因在于页面被投影到错误的潜在语义基上。提出一种基于权值调整的超链主题提取算法(weightedadjustments based hyperlinks topic distillation),先在获得根集的过程中,用改进的权值进行相似度计算,得到相对更为准确的个性化根集,再利用HITS算法计算Web页面的权威值和中心值。实验结果表明,基于权值调整的超链主题提取算法可以很好地改善HITS算法所导致的主题漂移问题,更适合于Web查询的需要。
HITS (hypertext-induced topic search) algorithm is one of the most important algorithms for linkage analysis, however a disadvantage of it is topic drift. The problem of topic drift due to the web pages projecting to wrong latent semantic basis is found. A new WAHTD (weighted adjustments based hyperlinks topic distillation) algorithm is presented, which constructs personalized root set and base set using weighted adjustments and then computes authority and hub value of web pages by HITS to distill topic. The experimental results show that WAHTD perform better than HITS in topic distillation quality and improve the topic drift problem, so it is more appropriate to Web query.
出处
《计算机工程与设计》
CSCD
北大核心
2007年第2期294-296,共3页
Computer Engineering and Design