摘要
主题网络爬虫的搜索策略是基于领域的搜索引擎的核心技术,爬虫搜索算法的性能直接关系着搜索引擎的性能。经过研究发现"最好优先算法"在重多搜索算法中表现的性能最优,但它本身也存在着收敛速度过快的缺陷,导致搜索引擎的"查全率"不高。针对这种情况,对"最好优先算法"做了调整与改进,并用Java技术给予了实现。
The search strategy of topic web crawler is the key technology of search engine based on the field.The function of crawler's search algorithm has direct impact on the function of search engine.The function of the Best-First algorithm is proved to be the best through research.But it has defect that its convergency speed is too quick.It's recall is not high.So,the Best-First algorithm is improved and realized by us.
出处
《微型电脑应用》
2009年第2期56-58,47,共4页
Microcomputer Applications