摘要
在分析、评价常用主题搜索ROBOT爬行策略的基础上,将三重过滤技术与改进的Shark启发式搜索算法相结合,设计了自动主题搜索引擎ROBOT的综合爬行策略。由于综合爬行策略在爬行中兼顾了网页的相关性、主题精度和网页质量,因此应用综合爬行策略在Web上下载主题相关网页时,既可利用链接分析扩大某个主题的资源覆盖度,又可保证搜索结果与主题高度相关。
Based on analyzing and evaluating on strategies of subject searching ROBOT in common use, an auto integrative crawling strategy of subject searching ROBOT was designed by combing treble filtrating technique with modified heuristic searching arithmetic of Shark in this paper. For considering web relativity, subject precision and web quality at the same time, when using the integrative crawling strategy to download correlative web, it was possible to enlarge the resource degree of coverage through link analyzing as well as to ensure the searching re.suits high correlate to the subject.
出处
《武汉理工大学学报》
EI
CAS
CSCD
北大核心
2006年第2期74-76,共3页
Journal of Wuhan University of Technology
基金
湖北省自然科学基金(2004ABA061)
关键词
主题搜索引擎
网络爬虫
综合爬行策略
subject search engine
web spider
integrative crawling strategy