摘要
论文主要为网络爬虫的设计及实现、反爬虫技术的实现及相关技术的研究。通过研究目标网站爬虫门槛的协商及通过的条件,及反爬虫相关技术及最新发展。基于Python设计及实现一个完整的网络爬虫,最终完成了对目标网站所有文章数据的提取和存储,并借助对实验室内部网站的测试并实现了绕过反爬虫及反爬虫技术的研究,并对网络爬虫及反爬虫技术进行了理论说明和发展展望。
This paper is mainly about the design and implementation of Web crawler,the implementation of anti reptile technology and related technology research. Through the study of target website crawler threshold negotiation and pass conditions,and anti reptile related technology and latest development,based on Python,a complete web crawler is designed and implemented. Finally,all the data of the target website are extracted and stored,and the research on the anti reptilian and anti reptilian technology is realized by the test with the web site of the laboratory. The theory and development trend of web crawler and anti crawler technology are also explained.
作者
李培
LI Pei(School of Computer Science & Technology,Xi'an University of Posts & Telecommunications,Xi'an 710121;Shaanxi Provincial Key Laboratory of Network Data Analysis and Intelligent Processing,Xi'an University of Posts & Telecommunications,Xi'an 710121)
出处
《计算机与数字工程》
2019年第6期1415-1420,1496,共7页
Computer & Digital Engineering
基金
国家自然科学基金项目(编号:61105064)
陕西省自然科学基础研究计划项目(编号:2016JM6085)
陕西省教育厅科学研究计划项目“基于文本挖掘的网络社区情感倾向研究”(编号:17JK0687)
陕西省普通高等学校重点学科专项资金建设项目资助