摘要
Web在过去数十年飞速发展,其低延迟和快响应的特性已经变得越来越重要.面对这样的需求,通常会预取用户即将访问的文件到缓存中,利用代理服务器缓存来获取数据,避免网络堵塞,提高Web访问效率.可见,在预取技术中,一个有效的预测模型是非常有必要的.针对目前缓存预取工作对用户差异关注度不足和度量指标单一化的薄弱环节,提出一个基于用户分级化的Web预测模型,并且能够随着Web请求进行多参数动态调整.该模型通过对代理服务器上用户访问情况分布的变化趋势分析,将用户集分为重要性不同的若干等级,并适当利用序列相似度来聚类低贡献用户产生的会话,之后在部分匹配预测模型的基础上,结合缓存替换策略为预测树结点构造包含多个参量的目标函数,并使构建好的模型能够进行自适应调整.最后通过实验证明该模型可以有效提高缓存的预取性能.
With the Web’s rapid development,the demands of low latency and fast response become increasingly urgent over the past few decades.In order to achieve this goal,the prefetching techniques are widely used,where documents are prefetched into caches in advance.Using prefetching techniques,we can avoid network congestion and raise access efficiency.Therefore,an effective prediction model is very essentialin the prefetching technique.Considering the necessities of high accuracy rate and practicability,we use the Prediction by Partial Match(PPM) suffix tree as a fundamental model to predict web pages.We point out some deficiencies on the side of neglect of users’ differences and the metric simplification in current cache-prefetching work.Then we present a multi-parameter web prediction model with a self-adaptation adjustment based on the user hierarchy.The main contents are listed as follows:First,we propose a user classification model based on the history access log in this paper.User behaviors are analyzed and user permutation distribution can be acquired.Then our model classifies users into different categories according to the user contribution degree distribution.The users with different contribution degree account ought to own different weights.In addition,for the users with very low contribution,we align their access web sequences and clusters them.Secondly,a method that sets the node objective function with the multi-parameter effecting is presented to construct the prediction model.The objective function involved with multiple parameters is constructed with elements related to cache replace strategies as the page accessing heat and the user classification accumulation based on the accessing frequency.And we regard the node with maximum value as one owns the strongest predictive ability.We also establish an adjustment mechanism when the prediction tree is working.So the model can learn continuously and adjust dynamically.Finally,we compare our model with several existing models through experiments.Our model has better performance on the prediction accuracy and the cache hit ratio,and we can get better results by adjusting model parameters.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2018年第1期85-96,共12页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(61472070
61672142)
关键词
WEB预取
缓存
用户差别化
多参量
自适应部分匹配预测模型
Web prefetching
cache
user differentiation
multi-parameter
self-adaption PPM(Prediction by Partial Match)