摘要
针对Heritrix抓取速度很慢的问题,运用ELFHash算法对Heritrix进行了多线程的优化,增加爬取线程数,实现了对指定网页精确的抓取,从而提高网页抓取的速度.实验表明本文的优化技术可行.
The main disadvantage of Heitrix is the slow crawling speed,which affects the information search speed seriously.The paper utilizeD the ELFHash to optimize the multi-threads of Heritrix,in order to improve the speed of crawling the web page.Experimental results show that this optimization technique is feasible.
出处
《湖北工业大学学报》
2012年第2期23-26,共4页
Journal of Hubei University of Technology