摘要
迅速发展的Web给传统的搜索引擎带来了前所未有的挑战,面向特定主题或特定领域采集相关信息的垂直搜索引擎应运而生。在垂直搜索引擎中,网络蜘蛛的爬行策略和主题相关性判定算法是其核心。该文详细介绍了一种主题相关性判定算法-Hits算法,并在此基础上提出了一个改进的主题相关性判定算法,实验表明,改进的Hits算法提高了爬取网页的主题相关度,有助于网络蜘蛛爬取特定主题的信息。
The tremendous growth of web has posed unprecedented challenges for the traditional search engines. Vertical search engines which collect relevant page information of specific topic emerged consequently.The crawling strategy and topic relevance algorithm of web spider is the core for vertical search engines.The article introduce the Hits algorithm detailedly and a new improved HITS algorithm is proposed. Experiments show that the proposed HITS algorithm can improve relevance ratio.It is helpful for crawling the thematic information.
出处
《电脑知识与技术(过刊)》
2009年第10X期8116-8118,共3页
Computer Knowledge and Technology
基金
陕西省自然科学基础研究计划项目(2007F52)