基于云计算的Pagerank算法的改进

An improved Pagerank algorithm based on cloud computing

导出

摘要针对Pagerank算法在Web结构挖掘中存在的需要大量迭代的问题,提出一种新的方法.该方法通过对原始Pagerank值的计算公式进行改进,降低了迭代次数.实验表明,在云计算环境下,新方法减少了网络通信和访问HDFS的消耗,在时间花费上优于传统的Pagerank算法. With the advent of the era of cloud computing, it is a new important research topic to discuss the problem of the web mining based the cloud computing. A new method is proposed to solve the large number of iterations problems in the Web structure mining for the Pagerank algorithm. Through improving the formula of the original pagerank value, it reduces the number of iterations. The experiments show that this method reduces the network traffic and the consumption of accessing HDFS in the cloud computing enviroment, and it is superior to the original Pagerank algorithm in the time consumption.

作者郑晶

机构地区福建江夏学院电子信息科学学院

出处《福州大学学报（自然科学版）》 CAS CSCD 北大核心 2014年第1期45-49,共5页 Journal of Fuzhou University(Natural Science Edition)

基金国家自然科学基金资助项目(30671680) 国家科技型中小企业技术创新基金资助项目(11C26213502126) 福建省教育厅科技资助项目(JA11269) 福建江夏学院青年资助项目(2011C005)

关键词云计算 WEB结构挖掘 PAGERANK MAPREDUCE cloud computing Web structure mining Pagerank Mapreduce

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1Chen M S, PARK J S, YU P S. Data mining for path traversal patterns in a Web environment[ C]//Proceedings of the 16th In- ternational Conference on Distributed Computing Systems. Hong Kong : IEEE, 1996 : 385 - 392.
2Brin S, Page L. The anatomy of a large - scale hypertextual Web search engine [ C ]//Proceedings of the Seventh International World Wide Web Conference. Brisbane: Elsevier Science Publishers, 1998:107 - 117.
3Haveliwala T H. Topic -sensitive Pagerank [ C ]//Proceedings of the Eleventh International World Wide Web Conference. New York: ACM, 2002:517-526.
4Richardson M, Domingos P. The intelligent surfer : probabilistic combination of link and content information in Pagerank [ J ]. Advances in Neural Information Processing Systems, 2002, 14, 1 441 - 1 448.
5宋聚平,王永成,尹中航,滕伟.对网页PageRank算法的改进[J].上海交通大学学报,2003,37(3):397-400. 被引量：40
6戚华春,黄德才,郑月锋.具有时间反馈的PageRank改进算法[J].浙江工业大学学报,2005,33(3):272-275. 被引量：27
7程苗.基于云计算的Web数据挖掘[J].计算机科学,2011,38(B10):146-149. 被引量：51
8Dean J, Ghemawat S. Mapreduce: simplied data processing on large cluster[ C ]//Proceedings of the 6'h Conference on Sympo- sium on Opearting Systems, Design and Implementation. [ s. 1. ] : USENIX Association, 2004.
9Stanford Universtity. Standfor network analysis platform [ EB/OL ]. [ 2002 - 05 - 08 ]. http : //snap. stanford, edu/data/index. html.

二级参考文献22

1席景科,闫大顺.Web数据挖掘中数据集成问题的研究[J].计算机工程与设计,2006,27(8):1366-1368. 被引量：6
2Cannataro M, Talia D, Trunfio P. KNOWLEDGE GRID.. High Performance Knowledge Discovery on the Grid [C] // Lecture Notes In Computer Science, Vol. 2242, Proceedings of the Second International Workshop on Grid Computing. 2001:38-50.
3Ye Yan-bin, Chiang C-C. A Parallel Apriori Algorithm for Frequent Item sets Mining[C]//Proeeedings of the Fourth International Conference on Software Engineering Research Manage- ment and Applications(SERA'06). 2006:87-94.
4Armbrust M, Fox A, Griffith R, et al. Above the Clouds: A Berkeley View of Cloud Computing.
5王鹏.云计算的关键技术与应用实例.
6Cooley R, Mobasher B, Srivastava J. Web mining: Information and pattern discovery on the World Wide Web[A]. 9th International Conference on Tools with Artificial Intelligence (ICTAI'97). IEEE Computer Society[C]. 1997. 558-567.
7Page L, Brin S, Motwani R, et al. The pagerank citation ranking: Bringing order to the WEB [EB/OL]. http://newdbpubs. stanford. edu/8090/pub/1999-66/1999-11-11.
8Jon M K. Authoritative sources in a hyperlinked environment [J]. Journal of the ACM, 1999,46(5):668-677.
9Oren Zamir, Oren Etzioni. Grouper: a dynamic clustering interface to Web search results [J]. Computer Networks, 1999, 31:58-63.
10Brin S, Page L. The anatomy of a large-scale hypertextual Web-search engine [A]. Proc 7th International World Wide Web Conference[C]. Brisbane:SIGIR, 1998. 146-164.

共引文献109

1杨格兰,涂立.基于主题相关性和链接权重的PageRank算法[J].华中科技大学学报（自然科学版）,2012,40(S1):300-303. 被引量：4
2屈志坚,郭亮,陈阁.基于Hadoop的电网监控信息流分布式处理研究[J].华东交通大学学报,2013,30(5):37-41. 被引量：5
3杨沅钊,吴薇,喻晓莉,杨国才.搜索引擎排名改进算法分析[J].农业网络信息,2005(2):41-43. 被引量：2
4戚华春,黄德才,郑月锋.具有时间反馈的PageRank改进算法[J].浙江工业大学学报,2005,33(3):272-275. 被引量：27
5李树青.结合网页内容分析的PageRank算法初探[J].情报杂志,2005,24(12):34-35. 被引量：1
6黄德才,戚华春.PageRank算法研究[J].计算机工程,2006,32(4):145-146. 被引量：69
7崔明,王振妘.当前搜索引擎不足及改进建议[J].图书馆学研究,2006(7):21-24.
8刘栋,刘希玉,郝婷婷.基于PageRank和HITS的Web结构挖掘算法研究[J].山东科学,2006,19(4):11-14. 被引量：6
9王秀平,马保权,李治柱.企业专用搜索引擎的搜索策略[J].计算机与现代化,2006(11):59-61. 被引量：4
10杨彬,康慕宁.基于概念的权重PageRank改进算法[J].情报杂志,2006,25(11):70-72. 被引量：10

1王玉珍.Google的PageRank技术分析[J].电脑学习,2007(5):13-15. 被引量：2
2李卫东,陆玲.融合VSM技术的PageRank算法研究与应用[J].计算机与现代化,2011(7):96-98. 被引量：4
3黄萍,王琛玮.圆圈结构及其变化系统的PageRank排名研究[J].计算机工程与应用,2017,53(9):127-135.

福州大学学报（自然科学版）

2014年第1期

浏览历史

内容加载中请稍等...

基于云计算的Pagerank算法的改进

参考文献9

二级参考文献22

共引文献109

相关作者

相关机构

相关主题

浏览历史