摘要
Web页面相似度搜索对于网络新闻推荐、近似查询等研究领域具有重要作用。SimRank是经典的相似度计算模型,但其预计算时间和空间开销非常巨大,不适用大规模Web页面网络。利用SimRank快速收敛的特点,在SimRank基础上提出高效Web页面相似度搜索方法(WSR),预计算1步迭代相似度矩阵,根据预计算的1步迭代相似度矩阵在线计算给定查询页面和其他页面的2步迭代相似度。通过对Web网络进行静态剪枝,进一步提高预计算和在线查询处理的效率。实验结果显示,WSR显著降低了存储开销和预计算时间开销,且具有较高精确度和快速查询响应时间。
Web pages similarity search plays important role in many research fields such as Web news recommendation and approximate query, etc. SimRank is a classical similarity computation model, however, it is not adaptable to large Webpage networks because its space and time cost is very high. Utilising the characteristic of SimRank in fast convergence, we propose an efficient Web pages similarity search (WSR) method. It pre-computes 1-hop iterative similarity matrix, and then conducts online computation of 2-hop iterative similarities of the given querying pages and other pages according to the computed 1-hop iterative similarity matrix. The pre-computation and online query processing efficiencies are further improved by static pruning on Web network. Experimental result shows that the WSR evidently reduces the storage cost and pre-computation time cost, and has higher accuracy and fast query responding time.
出处
《计算机应用与软件》
CSCD
北大核心
2014年第1期57-61,共5页
Computer Applications and Software
基金
山西省自然科学基金项目(2012011014-2)