摘要
在分析相关spider技术的基础上,提出了将强化学习技术应用到垂直搜索引擎的可控网络爬虫方法。该方法通过强化学习技术得到一些控制"经验信息",根据这些信息来预测较远的回报,按照某一主题进行搜索,以使累积返回的回报值最大。将得到的网页存储、索引,用户通过搜索引擎的搜索接口,就可以得到最佳的搜索结果。对多个网站进行主题爬虫搜索,实验结果表明,该方法对于网络的查全率和查准率都具有较大的提高。
Based on the analysis of related spider techniques,the approach for applying reinforcement learning technology to controllable web crawler of vertical search engine is proposed in the paper.It predicts the future reward based on some control "experience information" obtained through reinforcement learning,focuses on specific topic search to maximise the accumulated returned reward value.By storing and indexing the searched web pages,users can search through search interface provided by search engine to gain the optimal search results.The topic crawler searches have been executed on various websites,experimental results show the obvious enhancement in the recall and precision of the web.
出处
《计算机应用与软件》
CSCD
2011年第12期183-187,共5页
Computer Applications and Software
关键词
可控强化学习
垂直搜索引擎
网络爬虫
Controllable reinforcement learning Vertical search engine Web spider