摘要
网络爬虫主要分为通用爬虫和聚焦爬虫,前者通常指搜索引擎的爬虫,后者是指针对特定网站的爬虫。聚焦爬虫用于弥补通用搜索引擎的缺陷,应用在定向获取信息的检索工具即垂直搜索引擎上。以豆瓣图书信息获取为例,介绍网络爬虫的工作原理、分类、应用场景和涉及的关键技术,详细研究了基于Python的聚焦爬虫设计与实现的基本方法和流程。
This paper introduces the working principle,classification,application scenarios and key technologies of web crawler.Web crawler is mainly divided into general crawler and focused crawler.The former usually refers to the crawler of search engine,while the latter refers to specific website crawler.Focused crawler is to make up for the defects of general search engine.It is applied to the vertical search engine,which is a retrieval tool for directional information acquisition.Taking Douban book information acquisition as an example,this paper studies the basic method and process of design and implementation of focused crawler based on Python.
作者
杜超
DU Chao(Hubei University of Education,Wuhan 430205)
出处
《现代制造技术与装备》
2020年第12期30-31,共2页
Modern Manufacturing Technology and Equipment