基于分解转移矩阵的PageRank迭代计算方法被引量：4

A Method of Computing PageRank Based on Transition Matrix Decomposition

下载PDF

导出

摘要本文提出了一种基于分解转移矩阵的PageRank的迭代计算方法。该方法对PageRank理论模型进一步推导,把其Markov状态转移矩阵进行了分解,从而降低存储开销和计算复杂度,减少I/O需求,使得PageRank计算的工程化实现更为简单。实验表明1 700多万的网页2.8亿条链接,可以在30秒内完成一次迭代,内存需求峰值585MB,可以满足工程化应用的需求。 This paper proposes a method of computing PageRank based on transfer matrix decomposition. Based on the PageRank random surfer model, the method decomposes the Markov states transfer matrix, so that the memory cost, computational complexity and I/O needs are reduced drastically. Experiments show that each iteration can be completed in 30 seconds and that the peak memory demand is 585MB during the PageRank computation of 17 million Web Pages containing 280 million links, indicating that this method meets the demand for engineering applications.

作者刘松彬都云程施水才

机构地区北京信息科技大学中文信息处理研究中心

出处《中文信息学报》 CSCD 北大核心 2007年第5期41-45,共5页 Journal of Chinese Information Processing

基金 863计划重点项目资助(2006AA010105) 北京市教委科技发展计划项目资助(KM200710772010) 北京市属市管高校人才强教计划项目资助(PXM2007_014224_044677 PXM2007_014224_044676)

关键词计算机应用中文信息处理 PAGERANK 搜索引擎 Markov状态转移矩阵矩阵分解 computer application Chinese information processing PageRank search engine Markov state transi- tion matrix matrix decomposition

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1Sergey Brin,Lawrence Page.The Anatomy of a Large-Scale Hypertextual Web Search Engine[A].In:Proceedings of the 7th International WWW Conference[C].Brisbane,Australia:1998.107-117.
2Taher H.Haveliwala.Efficient Computation of PageRank[R].Stanford University Technical Report.1999.
3Taher Haveliwala,Sepandar Kamvar et al.Extrapolation methods for accelerating PageRank computations[A].In:Proceedings of the 12th International WWW Conference[C].2003.261-270.
4Taher Haveliwala,Sepandar Kamvar,Dan Klein et al.Computing PageRank using Power Extrapolation[R].Technical Report,Stanford University,2003.
5Sepandar Kamvar,Taher Haveliwala,Chris Manning,et al.Exploiting the Block Structure of the Web for Computing Pagerank[R].Stanford University,2003.
6Konstantin Avrachenkov and Nelly Litvak.Decomposition of the Google PageRank and optimal.linking strategy[R].Technical Report,INRIA,January,2004.
7Lawrence Page,Sergey Brin,Rajeev Motwani,et.al.The PageRank Citation Ranking:Bringing Order to the Web[R].Stanford Digital Libraries Working Paper,1998.
8Boldi P,Vigna S.The Webgraph framework Ⅰ:Compression techniques[A].In:Proceedings of the 13th World Wide Web Conference[C].New York:ACM Press,2004.595-601.
9IBM Almaden Research Center,CLEVER Searching[DB/OL].http://www.almaden.ibm.com.
10George Kingsley Zipf.Selective studies and the principle of relative frequency in language[M].Massachusetts:Harvard University Press,Cambridge,1931.

同被引文献34

1戚华春,黄德才,郑月锋.具有时间反馈的PageRank改进算法[J].浙江工业大学学报,2005,33(3):272-275. 被引量：27
2黄德才,戚华春.PageRank算法研究[J].计算机工程,2006,32(4):145-146. 被引量：69
3黄德才,戚华春,钱能.基于主题相似度模型的TS-PageRank算法[J].小型微型计算机系统,2007,28(3):510-514. 被引量：23
4凌波,周水庚,周傲英.P2P信息检索系统的查询结果排序与合并策略[J].计算机学报,2007,30(3):405-414. 被引量：13
5原福永,张园园.基于链接分析的相关排序方法的研究和改进[J].计算机工程与设计,2007,28(7):1630-1631. 被引量：11
6钱功伟,倪林,MIAO Yuan,曹荣.基于网页链接和内容分析的改进PageRank算法[J].计算机工程与应用,2007,43(21):160-164. 被引量：25
7Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web[R]. Califonia, USA: Stanford Digital Library, Tech. Rep.: SIDL-WP-1999-0120, 1998.
8Haveliwala T H. Topic-sensitive PageRank[C]//Proceedings of the 11 th International Conference on World Wide Web. Hawaii, USA: ACM Press, 2002.
9Richardson M, Domingos E The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank[J]. Advances in Neural Information Processing Systems, 2002, (14): 1441-1448.
10Haveliwala T. Effcien Computationof PageRank[R]. Califonia, USA: Computer Science Department, Stanford University, Technical Report: 1999-31-386, 1999.

引证文献4

1杨格兰,涂立.基于主题相关性和链接权重的PageRank算法[J].华中科技大学学报（自然科学版）,2012,40(S1):300-303. 被引量：4
2杨劲松,凌培亮.搜索引擎PageRank算法的改进[J].计算机工程,2009,35(22):35-37. 被引量：9
3贺志明,王丽宏,张刚,程学旗.一种抵抗链接作弊的PageRank改进算法[J].中文信息学报,2012,26(5):101-106. 被引量：6
4张恺.一种改进的基于云计算的PageRank算法[J].佛山科学技术学院学报（自然科学版）,2015,33(2):66-70. 被引量：1

二级引证文献20

1赵云泽,王珏,王洁.搜索引擎：三重属性的矛盾及解决途径[J].当代传播,2010(5):85-88. 被引量：1
2邓丹君,周彩兰.基于内容相关性和时间分析的改进PageRank算法[J].计算机与数字工程,2011,39(1):25-27. 被引量：7
3蒋凯,关佶红.基于重启型随机游走模型的图上关键字搜索[J].计算机工程,2011,37(3):42-43. 被引量：4
4李娜,刘俊辉.采用改进受欢迎度的PageRank优化算法[J].计算技术与自动化,2011,30(4):95-97.
5唐晓波,张昭.基于混合图的在线社交网络个性化推荐系统研究[J].情报理论与实践,2013,36(2):91-95. 被引量：10
6王春艳,李玉福.垂直搜索引擎中信息过滤技术的研究[J].情报科学,2014,32(3):93-97. 被引量：3
7黄贤英,陈红阳.基于用户兴趣度的PageRank改进算法[J].重庆理工大学学报（自然科学）,2014,28(5):74-78. 被引量：3
8马海昌,张志昌,赵学锋,刘鑫,孔波.面向经济领域的同义词获取融合方法研究[J].科学技术与工程,2014,22(15):207-211. 被引量：1
9施磊磊,施化吉,宋玉平,束长波.基于Hadoop的PageRank算法改进[J].软件导刊,2015,14(1):64-66.
10王洪伟,王伟,孟园.搜索引擎排序作弊的识别:基于文本内容和链接结构的分析[J].系统工程理论与实践,2015,35(2):445-457. 被引量：2

1周四清.一类小波域图像乘性水印信道容量的计算[J].计算机工程与应用,2006,42(35):15-16.
2李慧嘉,严冠,刘志东,李桂君,章祥荪.基于动态系统的网络社团线性探测算法[J].中国科学：数学,2017,47(2):241-256. 被引量：8
3王旭丛,李翠平,陈红.大数据下基于异步累积更新的高效P-Rank计算方法[J].软件学报,2014,25(9):2136-2148. 被引量：4
4李梦君,李舟军,陈火旺.基于逻辑程序的安全协议验证[J].计算机学报,2004,27(10):1361-1368. 被引量：7
5韩光辉,曾诚.KMP算法的理论研究[J].微电子学与计算机,2013,30(4):30-33. 被引量：7
6丁继成,赵琳,黄卫权,刘付强.中频GPS信号模拟器方案设计[J].数据采集与处理,2008,23(4):497-501. 被引量：3
7IBM将自主运算技术应用于WebSphere和DB2——WebSphereApplicationServer5和DB28将协助企业利用计算资源处理需求峰值[J].WebSphere Magazine（简体中文版技术通讯）,2003(4):4-4.
8A. Ramírez-López,G. Soto-Cortés,M. Palomar-Pardavé,M.A. Romero-Romo,R. Aguilar-López.Computational algorithms to simulate the steel continuous casting[J].International Journal of Minerals,Metallurgy and Materials,2010,17(5):596-607. 被引量：3
9张永功,王春华.注水管网系统数学模型的简单迭代计算方法研究[J].油气田地面工程,2009,28(8):30-31. 被引量：1
10冯毅,王亚弟,韩继红,范钰丹.基于逻辑程序的访问控制描述与推理[J].计算机工程,2008,34(8):77-79. 被引量：1

中文信息学报

2007年第5期

浏览历史

内容加载中请稍等...

基于分解转移矩阵的PageRank迭代计算方法被引量：4

参考文献11

同被引文献34

引证文献4

二级引证文献20

相关作者

相关机构

相关主题

浏览历史

基于分解转移矩阵的PageRank迭代计算方法 被引量：4

参考文献11

同被引文献34

引证文献4

二级引证文献20

相关作者

相关机构

相关主题

浏览历史

基于分解转移矩阵的PageRank迭代计算方法被引量：4