期刊文献+

基于Scrapy技术的数据采集系统的设计与实现 被引量:14

Design and Implementation of Data Acquisition System Based on Scrapy Technology
下载PDF
导出
摘要 面对互联网信息极其庞大并且经常更新的问题,基于Scrapy爬虫框架设计并实现了一种数据采集系统。不仅可以根据用户自身需求获取数据,还可以对自身的采集任务进行简单的管理。介绍了系统开发的关键技术,探讨了系统框架设计、功能模块和数据库设计方案。使用Django MTV模式进行开发,底层数据采集框架使用Scrapy,一种使用Python编写实现的网站数据异步爬虫应用框架,网页解析采用XPath和Python正则相结合的方法,采用j Query树插件z Tree实现了任务的树形管理,使用bootstrap实现了数据的任务名加关键字组合查询和页面效果。系统主要分为网页解析模块、数据处理模块、系统登录模块、任务新建模块、任务管理模块和数据查询模块。最后分析了浏览器端和服务器端的数据交互,以及网页数据定位和解析的实现。 For the huge and frequent updating of the Internet information,we design and implement a data acquisition system based on theScrapy crawler framework,which can not only obtain data according to the user’ s own needs,but also manage its own collection taskssimply. The key technology of system development is introduced,and the frame design,function module and database design scheme ofthe system are discussed. The Django MTV mode is used for development,and the underlying data collection framework applies Scrapy,an asynchronous crawler application framework implemented by Python. The web page analysis uses the method in combination of XPathand Python regular. The jQuery zTree plug-in is utilized to realize tree management of tasks,the bootstrap to achieve the effect of taskname with the keyword combination query and page. The system is divided into web page analysis module,data processing module,system login module,task module,task management module and data query module. Finally,the realization of data interaction betweenbrowser and server,and the web page data positioning and analysis are analyzed.
作者 杨君 陈春玲 余瀚 YANG Jun;CHEN Chun-ling;YU Han(School of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处 《计算机技术与发展》 2018年第10期177-181,共5页 Computer Technology and Development
基金 国家自然科学基金(11501302)
关键词 Scrapy DJANGO 数据采集 网络爬虫 Scrapy Django data acquisition Internet crawler
  • 相关文献

参考文献10

二级参考文献67

  • 1杨学明,刘柏嵩.主题爬虫在数字图书馆中的应用[J].图书馆杂志,2007,26(8):47-50. 被引量:3
  • 2孙瑞英.网络数据内容分析研究[J].图书馆学研究,2005(5):35-39. 被引量:12
  • 3夏崇镨,康丽.基于叙词表的主题爬虫技术研究[J].现代图书情报技术,2007(5):41-44. 被引量:8
  • 4刘金红,陆余良.主题网络爬虫研究综述[J].计算机应用研究,2007,24(10):26-29. 被引量:132
  • 5刘国靖,康丽,罗长寿.基于遗传算法的主题爬虫策略[J].计算机应用,2007,27(B12):172-174. 被引量:14
  • 6TAR JAN R.Depth-first search and linear graph algorithms[J]. SIAM Journal on Computing, 1972,1 (2) :146-160.
  • 7SHIOZAKI J, MATSUYAMA H, O'SHIMA E, et al.An improved algorithm for diagnosis of system failures in the chemical process[J].Computers & Chemical Engineering, 1985,9(3) : 285-293.
  • 8YU C C ,LEE C.Fauh diagnosis based on qualitative/quan- titative process knowledge[J].AIChE Journal, 1991,37(4): 617-628.
  • 9VENKATASUBRAMANIAN V,ZHAO J, VISWANATHAN S. Intelligent systems for HAZOP analysis of complex process plants[J].Computers & Chemical Engineering,2000,24(9): 2291 - 2302.
  • 10YANG F,SHAH S, XIAO D.Signed directed graph based modeling and its validation from process knowledge and process data[J].International Journal of Applied Mathematics and Computer Science, 2012,22(1) : 41-53.

共引文献138

同被引文献105

引证文献14

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部