摘要
社区互联网是以用户创造内容为主的新型互联网,具有很高的统计价值,由于权限和数据更新频繁等限制,传统的网络爬虫很难获取这一部分数据,设计并实现了一种可以自动登录并可以根据更新频率快慢智能抓取数据的爬虫,不同于以往爬虫以页面为粒度,该爬虫以人为最小粒度,并以人与人之间的关系为抓取依据,在获取这类数据上有很好的性能。
Social network is a new type of intemet based on users' creating contents ,which is of higlaly statistical value. Because of the limit of access and frequent updating, traditional web crawlers are difficult to obtain this part of data. This paper designs and implements an automatic crawler which can login and intelligently grab reptiles of data according to the speed of update frequency. This crawler is different from before for it is based on one person but not page ,and it relies on the relationship between persons, so it is of very good performance for this type of data.
出处
《智能计算机与应用》
2012年第4期65-67,共3页
Intelligent Computer and Applications