摘要
随着大数据相关技术的不断发展,数据的重要性越来越大,如何低成本第获取大量数据是一个值得研究的问题。通过网络爬虫采集数据是一个方便且成本较低的网络数据获取手段,而为了获取更多的数据,单机运行网络爬虫显然是不够的。因此,研究分布式网络爬虫软件,提出一个可行且成本较低的实现方案。
With the sustainable development of big data's related technology, how to get large amount of data at low cost becomes a question worthy of study. It is a convenient and low-cost way to collect network data by web crawlers. However, it is obviously insufficient to get more data through standalone network crawler. Therefore, studies the distributed web crawler software and puts forward a feasible and low-cost action program.
出处
《现代计算机》
2017年第16期62-65,共4页
Modern Computer
关键词
网络爬虫
大数据
分布式
Data Network Crawler
Big Data
Distributed