基于多agent强化学习的语义Web爬虫设计

The Design of a Multi-agent Reinforcement Learning Based on Semantic Web Crawler

下载PDF

导出

摘要 Web的海量信息导致了搜索引擎的出现,同时,Web数据的迅速膨胀以及频繁的更新对搜索引擎提出了更高的要求,而并行搜索引擎可以提高抓取速度,并改善更新效率.语义Web是对未来Web的一个设想,语义Web的数据同传统Web一样面临着数据的膨胀更新问题.于是研究语义Web并行搜索引擎成了一个重要的研究方向.介绍了如何设计一个基本的面向语义Web的并行爬虫系统.该系统由一个中央控制器和若干个子爬虫组成.中央控制器负责为爬虫分配抓取任务,并汇总抓取的数据;子爬虫负责抓取并抽取URLs的工作.而对于每个子爬虫除了处理RDF文档之外,还试图从传统HTML网页中通过强化学习的方法发现更多RDF文档链接. With the explosive increase and frequently update of web information,web search engine faces a big challenge.Semantic web is next generation web,and it also facing the problem of information expanding and updating quicklly.Parallel search engine can speed up web crawlling and improve updating efficiency.This paper describes a semantic web based parallel crawler system.The crawler system has a central controller and several crawlers.The controller dispatches tasks to each crawler and collect data from them.Each crawler has the ability of processing RDF document and learning from traditional HTML pages to find more RDF links.The learning method crawler used is reinforcement learning.

作者谢枫平

机构地区闽西职业技术学院计算机系

出处《漳州师范学院学报（自然科学版）》 2010年第4期63-68,共6页 Journal of ZhangZhou Teachers College（Natural Science)

关键词语义WEB 并行爬虫强化学习 Semantic Web Parallel Web Crawler Reinforcement Learning

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Cheng G,Qu Y.Term dependence on the semantic web[C].In:Proc.of the 7th International Semantic Web Conference (ISWC).LNCS 5318,2008,665-680.
2Cheng G,Qu Y.Falcons:Searching and Browsing Entities on the Semantic Web[C].In:Proc.of the 17th International Conference on World Wide Web (WWW),2008,1101-1102.
3Brin S,Page L.The anatomy of a large-scale hypertextual Web search engine[C].In:Proc.of 7th International World Wide Web Conference (WWW),1998,107-117.
4Ding L,et al.Swoogle:a search and metadata engine for the semantic web[C].In:Proc.of the 13th ACM international conference on Information and knowledge management,2004,652-659.
5Tummarello G,et al.Sindice:Weaving the Open Linked Data[C].In:Proc.of 6th International Semantic Web Conference (ISWC),2007,11-15.
6Kaelbling L P,et al.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research,1996,237-285.
7Bellman R E.Dynamic Programming[M].Princeton University Press,1957.
8Rennie J,McCallum A K.Using Reinforcement Learning to Spider the Web Efficiently[C].In:Proc.of 16th International Conference on Machine Learning,1999,335-343.

1曹琨.基于HMM的主题爬虫问题研究[J].河南科技,2016,35(17):27-28.
2高龙,贾宏,周俭.基于网格技术的并行搜索引擎[J].计算机工程,2009,35(6):257-259. 被引量：3
3刘芳,阎红卫.并行网络搜索引擎[J].微电脑世界,1999(21):82-83. 被引量：1
4李恒忠.系统IE浏览器迷你技巧40则(下)[J].电脑爱好者,2003(15):73-73.
5远渡重洋.文件素材就要批量下载[J].电脑迷,2013(2):83-83.
6吴伟,陈建峡.基于Heritrix的web信息抽取优化与实现[J].湖北工业大学学报,2012,27(2):23-26. 被引量：5
7汪涛,樊孝忠,顾益军,刘林.基于概念分析的主题爬虫设计[J].北京理工大学学报,2004,24(10):890-893. 被引量：10
8陈秀峰.在Word里自由来去[J].电脑爱好者,2003(12):46-46.
9孟祥乾,叶允明,邓斌.基于流水线负载平衡模型的并行爬虫研究[J].计算机工程,2009,35(2):34-36. 被引量：2
10戚欣.基于本体的主题网络爬虫设计[J].武汉理工大学学报,2009,31(3):138-141. 被引量：14

漳州师范学院学报（自然科学版）

2010年第4期

浏览历史

内容加载中请稍等...

基于多agent强化学习的语义Web爬虫设计

参考文献8

相关作者

相关机构

相关主题

浏览历史