摘要
随着网络的发展和网络爬虫技术的普及,越来越多的个人用户使用了网络爬虫技术从网站获取信息,而过多的网络爬虫会对目标网站的运营造成一定的影响。某些网站为了防止自己的网站不被除搜索引擎以外的网络爬虫访问,使用了一些反爬虫技术。笔者探讨了一些常见的反爬虫措施以及对应用了该类机制的网站的进行爬虫活动的手段和策略。
With the development of the network and the popularization of the web crawling technology, more and more individual users use the web crawler technology to get information from the website, while too many web crawlers will affect the operation of the target site. Some sites use anti-spider techniques to prevent their sites from being accessed by web crawlers other than search engines. This article explores some of the common anti-spider measures and the means and strategies for reptile activity on websites that use such mechanisms.
出处
《电脑知识与技术》
2017年第5X期19-21,23,共4页
Computer Knowledge and Technology
关键词
网络爬虫
反爬虫
反反爬虫
抓取策略
分布式爬虫
web crawler
anti-spider
anti-anti-spider
crawl strategy
distributed crawler