期刊文献+

基于Python的聚焦爬虫的初步设计与实现 被引量:6

Design and Implementation of Focused Crawler Based on Python
下载PDF
导出
摘要 网络爬虫主要分为通用爬虫和聚焦爬虫,前者通常指搜索引擎的爬虫,后者是指针对特定网站的爬虫。聚焦爬虫用于弥补通用搜索引擎的缺陷,应用在定向获取信息的检索工具即垂直搜索引擎上。以豆瓣图书信息获取为例,介绍网络爬虫的工作原理、分类、应用场景和涉及的关键技术,详细研究了基于Python的聚焦爬虫设计与实现的基本方法和流程。 This paper introduces the working principle,classification,application scenarios and key technologies of web crawler.Web crawler is mainly divided into general crawler and focused crawler.The former usually refers to the crawler of search engine,while the latter refers to specific website crawler.Focused crawler is to make up for the defects of general search engine.It is applied to the vertical search engine,which is a retrieval tool for directional information acquisition.Taking Douban book information acquisition as an example,this paper studies the basic method and process of design and implementation of focused crawler based on Python.
作者 杜超 DU Chao(Hubei University of Education,Wuhan 430205)
出处 《现代制造技术与装备》 2020年第12期30-31,共2页 Modern Manufacturing Technology and Equipment
关键词 网络爬虫 PYTHON lxml web crawler Python lxml
  • 相关文献

同被引文献46

引证文献6

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部