期刊文献+

Python框架下基于主题的数据爬取技术研究与实现 被引量:3

Research and implementation of theme-based data crawling technology with Python framework
下载PDF
导出
摘要 如今上网查询和购物已经成为人们的生活必需。由于在很多系统上查看商品或资源需要点击跳转多个页面,随着浏览时间的增加,经常会出现眼花缭乱的感觉。若只为用户呈现必要的数据,必将提高筛选资源的效率。文章使用Python语言结合目前流行的Spring MVC框架来爬取目标网站的数据,设计了数据爬取模块和数据展示模块,实现了基于主题的爬虫框架。通过爬取实验与结果测试,成功爬取到了目标网站的数据并展示到自己的页面上,实现了预期的目标。 Nowadays, online enquiries and shopping have become the indispensable of people's daily life. Because viewing goodsor resources on many systems requires clicking and jumping over multiple pages, it is often a dazzling feeling as browsing timeincreases. If only provide users with the necessary data, the efficiency of screening resources will certainly be improved.Combining with the popular Spring MVC framework, this paper uses Python language to crawl the data of the target website,designs the data crawling module and data display module, and implements the theme-based crawler framework. The crawlingexperiment and the test result show that, the data of the target website is crawled and displayed on its own page, and theexpected goal is achieved.
作者 严斐 肖璞 Yan Fei;Xiao Pu(Sanjiang University,Nanjing,Jiangsu 210012,China)
出处 《计算机时代》 2018年第11期10-13,共4页 Computer Era
基金 江苏省高等学校自然科学研究面上项目(17KJD520007)
关键词 数据爬取 基于主题 爬虫 SPRINGMVC data crawling theme-based crawler Spring MVC
  • 相关文献

参考文献6

二级参考文献34

  • 1郑冬冬,赵朋朋,崔志明.Deep Web爬虫研究与设计[J].清华大学学报(自然科学版),2005,45(S1):1896-1902. 被引量:28
  • 2周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005,25(9):1965-1969. 被引量:153
  • 3孙彬,王东,李娟.基于XQuery的Deep Web搜索系统的设计与实现[J].科学技术与工程,2007,7(16):4080-4084. 被引量:2
  • 4Hersovici M,Heydon A,Mitzenmacher M et al.The shark-search algorithm-an application:Tailored web site mapping. Pro-ceedings of the7th International World Wide Web Conference . 1998
  • 5Kleinberg J.Authoritative sources in a hyperlinked environment. Journal of the ACM . 1998
  • 6J.Cho,H.Garcia-Molina.The evolution of the web and implications for an incremental crawler. Proceedings of the26th Inter-national Conference on Very Large Database . 2000
  • 7M.Najork,J.L.Wiener.Breadth-first crawling yields high-quality pages. Proceedings of the10th International Conference on World Wide Web . 2001
  • 8Yan HF,Wang JY,Li XM,et al.Architectual design and evaluation of an efficient Web-crawling system. The Journal of Systems and Software . 2002
  • 9M K.Bergman.The Deep Web:Surfaceing Hidden Value. http://www.completeplanet.com/Tutorials/DeepWeb . 2000
  • 10Yiyao Lu,Hai He,Hongkun Zhao,et al.Annotating Structured Data of the Deep Web. IEEE23rd International Conference on Data Engineering . 2007

共引文献189

同被引文献19

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部