摘要
设计并实现了一种通用的具有高可靠性和可扩展性的分布式网络数据抓取系统.给出了服务器和抓取节点的执行算法,并利用实时数据库Influx DB和可视化框架Grafana设计了抓取节点的性能监控系统.利用系统可以跟据需求对互联网的数据进行快速地抓取和收集.
In this paper, a universal distributed data crawling system with high reliability and scalability was designed and implemented. The algorithms that run on server and crawling nodes respectively were described. A performance monitoring system based on InfluxDB and Grafana was also created for real - time monitoring. This system can be used to rapidly crawl and collect the data from internet by requirements.
出处
《哈尔滨商业大学学报(自然科学版)》
CAS
2016年第3期307-312,共6页
Journal of Harbin University of Commerce:Natural Sciences Edition