摘要
为了满足搜索引擎系统日益增长的高质量检索需求,针对Nutch搜索引擎框架没有实现Google的PageR-ank网页排序算法,分析了PageRank算法,并通过实验验证了PageRank算法的有效性,成功搭建Hadoop分布式集群,在Nutch框架中基于MapReduce分布式编程模式编码实现PageRank算法。实验结果表明,在Nutch搜索引擎系统中实现了PageRank算法后,系统的检索具有更高的准确率,能够更好地为用户提供检索服务。
As the Nutch search engine framework does not realize Google's PageRank page sort algorithm, in order to meet the search engine's growing demand for high quality retrieval needs, the PageRank algorithm is analyzed, the validity of the algorithm is verified by the experiments, Hadoop distributed cluster is built successfully, and PageR- ank algorithm is realized in Nutch framework based on MapReduce distributed programming model. Experimental results show that the Nutch search engine system works with higher accuracy and provides users with better retrieval services in PageRank algorithm.
出处
《桂林电子科技大学学报》
2013年第2期139-143,共5页
Journal of Guilin University of Electronic Technology
基金
国家自然科学基金(61163057)
广西自然科学基金(2012GXNSFAA053228)