期刊文献+

云计算在Web结构挖掘算法中的运用研究 被引量:2

Research on application of cloud computing in Web structure mining algorithm
下载PDF
导出
摘要 在研究Web结构挖掘经典算法Pagerank和云计算关键技术Mapreduce的基础上,将Pagerank算法与Mapreduce编程模型结合,针对基于并行Pagerank算法运行大数据集时面临的每次迭代访问HDFS导致I/O消耗增加、每次迭代在混合阶段和排序阶段时耗过多的问题提出了两个改进算法。一个是利用矩阵分块思想的并行Pagerank改进算法;另一个是减少HDFS访问次数的并行Pagerank改进算法。最后利用Hadoop搭建云环境,在实验环境下分析了不同的BlockSize参数对于计算性能的影响。并在云环境下面向不同的Web数据集,测试了原算法和改进算法的性能。结果表明,改进后的算法分别在结果集的空间占用方面和总迭代时间方面具有一定的优越性。 Pagerank algorithm and Mapreduce programming model are combined based on studying both of them. In consideration of the problems of Pagerank when running large datasets, two improvements are put forward. First, the idea of matrix part,ion to reduce the time consumption in mixing and sorting period of Pagerank in iteration is implied. Second, an algorithm based on reducing the number of HDFS accessing is proposed. Finally, the performances of the three algorithms under different web datasets are tested and compared. The result proves that the improved algorithm has advantages in space usage and iteration time.
作者 蓝昊慧
出处 《计算机时代》 2012年第10期30-33,37,共5页 Computer Era
关键词 云计算 WEB结构挖掘 分布式计算 MAPREDUCE Hadoop PAGERANK cloud computing Web structure mining distribution computing Pagerank Hadoop Mapreduce
  • 相关文献

参考文献2

二级参考文献34

  • 1吴春旭,郭磊.Web结构挖掘的PageRank算法改进[J].情报杂志,2005,24(10):55-56. 被引量:3
  • 2刘栋,刘希玉,郝婷婷.基于PageRank和HITS的Web结构挖掘算法研究[J].山东科学,2006,19(4):11-14. 被引量:6
  • 3VARIA J. Cloud architectures - Amazon Web services [ EB/OL]. [ 2009 - 03 - 01 ]. http://acmbangalore, org/events/monthly-talk/ may-2008 --cloud-architectures---amazon-web-services. html.
  • 4BRYANT R E. Data-intensive supercomputing: The case for DISC, CMU-CS-07-128 [ R]. Pittsburgh, PA, USA: Carnegie Mellon University, Department of Computer Science, 2007.
  • 5SZALAY A S, KUNSZT P, THAKAR A, et al. Designing and mining multi-terabyte astronomy archives: The sloan digital sky survey [ C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2000:451 - 462.
  • 6BARROSO L A, DEAN J, HOLZLE U. Web search for a planet: The Google cluster architecture [ J]. IEEE Micro, 2003, 23(2) : 22 -28.
  • 7GILES J. Google tops translation ranking [ EB/OL]. (2006 - 11 - 06) [ 2009 - 03 - 06 ]. http://www, nature, com/news/2006/ 061106/full/news061106-6. html.
  • 8维基百科.Cloud computing [ EB/OL]. [ 2009 - 03 - 10]. http://en. wikipedia, org/wiki/Cloud_computing.
  • 9中国云计算网.什么是云计算?[EB/OL].(2008-05-14)[2009-02-27].http://www.cloudcomputing-china.cn/Article/ShowArticle.asp?ArticleID=1.
  • 10VAQUERO L M, RODERO-MERINO L, CACERES J, et al. A break in the clouds: Towards a cloud definition [ J]. ACM SIGCOMM Computer Communication Review, 2009, 39(1): 50-55.

共引文献929

同被引文献18

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部