摘要
随着互联网的迅速发展,电商已经成为人们主要的日常消费渠道。以购买计算机相关书籍为例,清楚了解众多种类书籍的相关信息成为一种需求。为解决这种需求进行了一种基于Python语言Scrapy爬虫框架的模拟登陆浏览器和网页解析技术的研究,将获取的电商书籍信息存入Mongo DB数据库或者本地硬盘以便后续进行数据分析。所实现的爬虫程序编程简单、性能稳定,能有效获取电商书籍数据。
With the rapid development of the internet, the online mall has become the main consumption pattern in our daily life. If people want to buy some books about computer, for example, to clearly understand related information about the various types of books become a demand. In order to solve this demand, we make a research about a kind of simulated landing browser and web page analysis technology based on the Scrapy crawler framework of Python language. And program stores the acquired book's information into the Mongo DB database or local hard drive for subsequent data analysis. The implementation of the reptile program programming is simple,stable performance, and can effectively obtain electricity business book's data.
作者
晋振杰
曹少中
项宏峰
王明道
李新佩
JIN Zhenjie;CAO Shaozhong;XlANG Hongfeng;WANG Mingdao;LI Xinpei(Beijing Institute of Graphic Communication,Beijing 102600,China)
出处
《北京印刷学院学报》
2018年第3期39-42,共4页
Journal of Beijing Institute of Graphic Communication
基金
国家自然基金(61472461)
国家重大科学仪器设备开发专项(2013YQ140517)