期刊文献+

基于Python的网络爬虫与反爬虫技术研究 被引量:48

Research on Python-based Web Crawler and Anti-reptile Technology
下载PDF
导出
摘要 论文主要为网络爬虫的设计及实现、反爬虫技术的实现及相关技术的研究。通过研究目标网站爬虫门槛的协商及通过的条件,及反爬虫相关技术及最新发展。基于Python设计及实现一个完整的网络爬虫,最终完成了对目标网站所有文章数据的提取和存储,并借助对实验室内部网站的测试并实现了绕过反爬虫及反爬虫技术的研究,并对网络爬虫及反爬虫技术进行了理论说明和发展展望。 This paper is mainly about the design and implementation of Web crawler,the implementation of anti reptile technology and related technology research. Through the study of target website crawler threshold negotiation and pass conditions,and anti reptile related technology and latest development,based on Python,a complete web crawler is designed and implemented. Finally,all the data of the target website are extracted and stored,and the research on the anti reptilian and anti reptilian technology is realized by the test with the web site of the laboratory. The theory and development trend of web crawler and anti crawler technology are also explained.
作者 李培 LI Pei(School of Computer Science & Technology,Xi'an University of Posts & Telecommunications,Xi'an 710121;Shaanxi Provincial Key Laboratory of Network Data Analysis and Intelligent Processing,Xi'an University of Posts & Telecommunications,Xi'an 710121)
出处 《计算机与数字工程》 2019年第6期1415-1420,1496,共7页 Computer & Digital Engineering
基金 国家自然科学基金项目(编号:61105064) 陕西省自然科学基础研究计划项目(编号:2016JM6085) 陕西省教育厅科学研究计划项目“基于文本挖掘的网络社区情感倾向研究”(编号:17JK0687) 陕西省普通高等学校重点学科专项资金建设项目资助
关键词 网络爬虫 Scrapy框架 反爬虫 Web crawler Scrapy frame anti reptile
  • 相关文献

参考文献7

二级参考文献26

  • 1Zuo Xiaojun, Zhang Kaituo. An improved search algorithm of focused crawler in vertical search engine [C]. Asia-Pacific Youth Conference On Communication Technology2010 (APYCCT 2010), 2010: 509-513.
  • 2Ju Xiaolin, Chen Jihong, Shao Haoran. Hierarchical Web page classification method based on vector space model[C].Journal of Nantong University (Natural Science Edition), 2010.
  • 3Yang Shengyuan. A focused crawler with ontology-supported website models for information agents [C]. Advances in Grid and Pervasive Computing, 2010:522-532.
  • 4LI Jun, FURUSE K, YAMAGUCHI K. Focused crawling by exploiting anchor text using decision tree [C]. Proceedings of the 14th International World Wide Web Conference, 2005: 1190-1191.
  • 5CHEN Y. A novel hybrid focused crawling algorithm to build domain-specific collections[D]. Ph.D. Thesis, Virginia Polytechnic Institute and State University, 2007.
  • 6STEINBACH M, KARYPIS G, KUMAR V. A comparison of document clustering techniques [C]. Sixth ACM SIGKDD, World Text Mining Conference, Boston, MA, 2000.
  • 7UDDIN M Z, LEE J J, KIM T S. Independent shape component-based human activity recognition via Hidden Markov Model[J]. Applied Intelligence, 2010,33(2) : 193-206.
  • 8李勇,韩亮.主题搜索引擎中网络爬虫的搜索策略研究[J].计算机工程与科学,2008,30(3):4-6. 被引量:37
  • 9刘庆杰,孙旭光,王小英.通过Filter抵御网页爬虫[J].网络安全技术与应用,2010(1):70-71. 被引量:5
  • 10范纯龙,袁滨,余周华,徐蕾.基于陷阱技术的网络爬虫检测[J].计算机应用,2010,30(7):1782-1784. 被引量:4

共引文献52

同被引文献278

引证文献48

二级引证文献152

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部