期刊文献+

基于Python的电商书籍数据爬虫研究 被引量:5

Research on Data Crawler of Electric Business Books Based on Python
下载PDF
导出
摘要 随着互联网的迅速发展,电商已经成为人们主要的日常消费渠道。以购买计算机相关书籍为例,清楚了解众多种类书籍的相关信息成为一种需求。为解决这种需求进行了一种基于Python语言Scrapy爬虫框架的模拟登陆浏览器和网页解析技术的研究,将获取的电商书籍信息存入Mongo DB数据库或者本地硬盘以便后续进行数据分析。所实现的爬虫程序编程简单、性能稳定,能有效获取电商书籍数据。 With the rapid development of the internet, the online mall has become the main consumption pattern in our daily life. If people want to buy some books about computer, for example, to clearly understand related information about the various types of books become a demand. In order to solve this demand, we make a research about a kind of simulated landing browser and web page analysis technology based on the Scrapy crawler framework of Python language. And program stores the acquired book's information into the Mongo DB database or local hard drive for subsequent data analysis. The implementation of the reptile program programming is simple,stable performance, and can effectively obtain electricity business book's data.
作者 晋振杰 曹少中 项宏峰 王明道 李新佩 JIN Zhenjie;CAO Shaozhong;XlANG Hongfeng;WANG Mingdao;LI Xinpei(Beijing Institute of Graphic Communication,Beijing 102600,China)
出处 《北京印刷学院学报》 2018年第3期39-42,共4页 Journal of Beijing Institute of Graphic Communication
基金 国家自然基金(61472461) 国家重大科学仪器设备开发专项(2013YQ140517)
关键词 电商书籍 爬虫 PYTHON Scrapy框架 electric business books Crawler Python Scrapy
  • 相关文献

参考文献6

二级参考文献52

  • 1赵信会.民事诉讼中的证据调查制度[J].现代法学,2004,26(6):87-92. 被引量:9
  • 2杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量:188
  • 3EHRIG M, MAEDCHE A. Ontology-focused crawling of Web documents[A]. Proceedings of the 2003 ACM symposium on Applied computing[C], March 2003.
  • 4GUO Q, GUO H, ZHANG ZQ, et al. Schema Driven Topic Specific Web Crawling[A]. DASFAA[C], 2005.
  • 5GRAUPMANN J, BIWER M, ZIMMER C, et al. COMPASS: A Concept-based Web Search Engine for HTML, XML, and Deep Web Data[A]. Proceedings of the 30th VLDB Conference[C],2004.
  • 6QIN JL, ZHOU YL, CHAU M. Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method[A]. Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries[C], June 2004.
  • 7CHO J , GARCIA - MOLINA H , PAGE L . Efficient crawling through URL ordering[A]. Proceedings of the seventh international conference on World Wide Web 7[C], April 1998.
  • 8FLORESCU D, LEVY AY, MENDELZON AO. Database techniques for the world-wide web: A survey[J]. SIGMOD Record, 1998,27(3) :59 -74.
  • 9LAWRENCE S, GILES CL. Searching the World Wide Web[J].Science, 1998,280(5360):98.
  • 10CHAKRABARTI S, VAN DEN BERG M, DOM B. Focused crawling: A new approach to topicspecific web resource discovery[A].Proceedings of the Eighth International World-Wide Web Conference[C], 1999.

共引文献383

同被引文献23

引证文献5

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部