摘要
本文提出了一种基于分解转移矩阵的PageRank的迭代计算方法。该方法对PageRank理论模型进一步推导,把其Markov状态转移矩阵进行了分解,从而降低存储开销和计算复杂度,减少I/O需求,使得PageRank计算的工程化实现更为简单。实验表明1 700多万的网页2.8亿条链接,可以在30秒内完成一次迭代,内存需求峰值585MB,可以满足工程化应用的需求。
This paper proposes a method of computing PageRank based on transfer matrix decomposition. Based on the PageRank random surfer model, the method decomposes the Markov states transfer matrix, so that the memory cost, computational complexity and I/O needs are reduced drastically. Experiments show that each iteration can be completed in 30 seconds and that the peak memory demand is 585MB during the PageRank computation of 17 million Web Pages containing 280 million links, indicating that this method meets the demand for engineering applications.
出处
《中文信息学报》
CSCD
北大核心
2007年第5期41-45,共5页
Journal of Chinese Information Processing
基金
863计划重点项目资助(2006AA010105)
北京市教委科技发展计划项目资助(KM200710772010)
北京市属市管高校人才强教计划项目资助(PXM2007_014224_044677
PXM2007_014224_044676)