期刊文献+

基于Hadoop的Nutch网页排序算法研究与实现 被引量:4

Research and implementation of Nutch Web sort algorithm based on Hadoop
下载PDF
导出
摘要 为了满足搜索引擎系统日益增长的高质量检索需求,针对Nutch搜索引擎框架没有实现Google的PageR-ank网页排序算法,分析了PageRank算法,并通过实验验证了PageRank算法的有效性,成功搭建Hadoop分布式集群,在Nutch框架中基于MapReduce分布式编程模式编码实现PageRank算法。实验结果表明,在Nutch搜索引擎系统中实现了PageRank算法后,系统的检索具有更高的准确率,能够更好地为用户提供检索服务。 As the Nutch search engine framework does not realize Google's PageRank page sort algorithm, in order to meet the search engine's growing demand for high quality retrieval needs, the PageRank algorithm is analyzed, the validity of the algorithm is verified by the experiments, Hadoop distributed cluster is built successfully, and PageR- ank algorithm is realized in Nutch framework based on MapReduce distributed programming model. Experimental results show that the Nutch search engine system works with higher accuracy and provides users with better retrieval services in PageRank algorithm.
出处 《桂林电子科技大学学报》 2013年第2期139-143,共5页 Journal of Guilin University of Electronic Technology
基金 国家自然科学基金(61163057) 广西自然科学基金(2012GXNSFAA053228)
关键词 HADOOP集群 MAPREDUCE NUTCH 网页排序算法 PAGERANK Hadoop cluster MapReduce Nuteh page sort algorithm PageRank
  • 相关文献

参考文献6

二级参考文献44

  • 1戚华春,黄德才,郑月锋.具有时间反馈的PageRank改进算法[J].浙江工业大学学报,2005,33(3):272-275. 被引量:27
  • 2黄德才,戚华春.PageRank算法研究[J].计算机工程,2006,32(4):145-146. 被引量:69
  • 3肖明军,黄刘生,罗永龙.SHITS:一种基于超链接和内容的网页排序方法[J].小型微型计算机系统,2006,27(12):2177-2182. 被引量:6
  • 4黄德才,戚华春,钱能.基于主题相似度模型的TS-PageRank算法[J].小型微型计算机系统,2007,28(3):510-514. 被引量:23
  • 5Brin S, Page L. The Anatomy of a Large-scale Hypertextual Web Search Engine[C]//Proc. of WWW'98. Brieman, Australia: [s. n.], 1998:107-117.
  • 6Boldi P, Santini M, Vigna S. PageRank as a Function of the Damping Factor[C]//Proc. of International World Wide WebConference. Chiba, Japan: [s. n.], 2005: 557-566.
  • 7Chau Michael, Zeng Daniel, Chen Hsinchun. Personalized Spiders for Web Search and Analysis[C]//Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries. [S. l]: IEEE Press. 2001: 79-87.
  • 8Nutch:The Java Search Engine[Z].[2009-05-18].http://lucene.apache.org/nutch.
  • 9Dean J,Ghemawat S.MapReduce:Simplified Data Processing onLarge Clusters[Z].2004.
  • 10Castillo R,Matin C,Rodriguez M.Crawling a Country:BetterStrategies than Breadth-first for Web Page Ordering[J].ACM Transactions on Database Systems,2005,23(4):864-872.

共引文献69

同被引文献31

  • 1李育嫦.文献检索中提高查全率与查准率的方法探讨[J].图书馆学研究,2002(11):92-93. 被引量:26
  • 2胡骏,李星.校园网信息资源搜索引擎的研究与实现[J].计算机工程与设计,2006,27(24):4629-4631. 被引量:14
  • 3Sergey Brin, Lawrence Page. The PageRank Citation Ranking: Bring Order to the Web [ C ]. Stanford:Computer Science Department, 1998 : 107 - 117.
  • 4PasquineUi M. Google's pagerank algorithm: a diagram of cognitive capitalism and the rentier of the common intellect[J]. Deep Search,2009 (3) : 152 - 162.
  • 5Haveliwala. Topic-Sensitive PageRank:A Context-Sensitive Ranking Algorithm for Web Search [ J ]. IEEE Transactions on knowledge and data engineering,2003,15 (4) :784 - 796.
  • 6Richardson M,Domingos P. The intelligent surfer: probabilistic combination of link and content informaionin in PageRank[J]. Advances in Neural Information Processing Systems,2002,14(3) : 1 441 - 1 448.
  • 7Eric J Glover, Kostas Tsioutsiouliklis, Steve Lawrence, etal. Using web structure for classifying and describing web pages [ C ]. New York: Proceedings of the l lth international conference on World Wide Web,2002:562 -569.
  • 8百度百科.垂直搜索引擎.
  • 9Cho J,Garcia-Molina H. The evolution of the web and implications for an incremental crawler [A].2000.1-21.
  • 10Agrawal P,Kifer D,Olston C. Scheduling shared scans of large data files[J].Proceedings of the VLDB Endowment,2008,(01):958-969.

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部