UCM-PPM:基于用户分级的多参量Web预测模型

UCM-PPM:Multi-parameter web prediction model based on the user classification

下载PDF

导出

摘要 Web在过去数十年飞速发展,其低延迟和快响应的特性已经变得越来越重要.面对这样的需求,通常会预取用户即将访问的文件到缓存中,利用代理服务器缓存来获取数据,避免网络堵塞,提高Web访问效率.可见,在预取技术中,一个有效的预测模型是非常有必要的.针对目前缓存预取工作对用户差异关注度不足和度量指标单一化的薄弱环节,提出一个基于用户分级化的Web预测模型,并且能够随着Web请求进行多参数动态调整.该模型通过对代理服务器上用户访问情况分布的变化趋势分析,将用户集分为重要性不同的若干等级,并适当利用序列相似度来聚类低贡献用户产生的会话,之后在部分匹配预测模型的基础上,结合缓存替换策略为预测树结点构造包含多个参量的目标函数,并使构建好的模型能够进行自适应调整.最后通过实验证明该模型可以有效提高缓存的预取性能. With the Web’s rapid development,the demands of low latency and fast response become increasingly urgent over the past few decades.In order to achieve this goal,the prefetching techniques are widely used,where documents are prefetched into caches in advance.Using prefetching techniques,we can avoid network congestion and raise access efficiency.Therefore,an effective prediction model is very essentialin the prefetching technique.Considering the necessities of high accuracy rate and practicability,we use the Prediction by Partial Match（PPM） suffix tree as a fundamental model to predict web pages.We point out some deficiencies on the side of neglect of users’ differences and the metric simplification in current cache-prefetching work.Then we present a multi-parameter web prediction model with a self-adaptation adjustment based on the user hierarchy.The main contents are listed as follows：First,we propose a user classification model based on the history access log in this paper.User behaviors are analyzed and user permutation distribution can be acquired.Then our model classifies users into different categories according to the user contribution degree distribution.The users with different contribution degree account ought to own different weights.In addition,for the users with very low contribution,we align their access web sequences and clusters them.Secondly,a method that sets the node objective function with the multi-parameter effecting is presented to construct the prediction model.The objective function involved with multiple parameters is constructed with elements related to cache replace strategies as the page accessing heat and the user classification accumulation based on the accessing frequency.And we regard the node with maximum value as one owns the strongest predictive ability.We also establish an adjustment mechanism when the prediction tree is working.So the model can learn continuously and adjust dynamically.Finally,we compare our model with several existing models through experiments.Our model has better performance on the prediction accuracy and the cache hit ratio,and we can get better results by adjusting model parameters.

作者王卓君申德荣聂铁铮寇月于戈

机构地区东北大学计算机科学与工程学院

出处《南京大学学报（自然科学版）》 CAS CSCD 北大核心 2018年第1期85-96,共12页 Journal of Nanjing University（Natural Science）

基金国家自然科学基金(61472070 61672142)

关键词 WEB预取缓存用户差别化多参量自适应部分匹配预测模型 Web prefetching cache user differentiation multi－parameter self－adaption PPM（Prediction by Partial Match）

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1孟涛,闫宏飞,王继民.Web网页信息变化的时间局部性规律及其验证[J].情报学报,2005,24(4):398-406. 被引量：8

二级参考文献17

1张志刚,陈静,李晓明.一种HTML网页净化方法[J].情报学报,2004,23(4):387-393. 被引量：57
2P.Denning and S.Schwartz.Properties of the working set model.Communication of the ACM,1972,15(3):191～198
3Lee Breslau,Pei Cao,Li Fan,Graham Phillips,Scott Shenker.Web caching and Zipf-like distributions: evidence and implications.In: Proceedings of IEEE Infocom 99,New York,NY,March,1999.126～134
4Jeffrey Spirn.Distance string models for program behaviour.IEEE Computer,1976,13(11)
5.[EB/OL].北京大学天网WWW搜索引擎.http://e.pku.edu.cn/,.
6R.Mattson,J.Gecsei,D.Slutz and I.Traiger.Evaluation techniques and storage hierarchies.IBM Systems Journal,1970,9: 78～117
7Junghoo Cho,Hector Garcia-Molina.The evolution of the web and implications for an incremental crawler,Page 10,11,1997.In: Proceedings of 26th International Conference on Very Large Databases (VLDB),September 2000
8Junghoo Cho,Hector Garcia-Molina.Estimating frequency of change.ACM Transactions on Internet Technology,2003,3(3)
9Andrei Z.Broder,Marc Najork,Janet L.Wiener.Efficient URL caching for World Wide Web crawling.In: Proceedings of the Twelfth International World Wide Web Conference,Budapest,Hungary,May 2003
10T.Berners-Lee,et.al.Uniform Resource Identifiers(URI): Generic Syntax.RFC 2396,August 1998.http://www.ietf.org/rfc/rfc2396.txt

共引文献7

1孟涛,王继民,闫宏飞.网页变化与增量搜集技术[J].软件学报,2006,17(5):1051-1067. 被引量：22
2徐和祥,王鑫印,王述云,胡运发.基于知识的Deep Web集成环境变化处理的研究[J].软件学报,2008,19(2):257-266. 被引量：6
3张艳艳.BP网络在Web Crawler中的应用[J].微计算机信息,2008,24(27):95-96.
4罗倩,姜恩波.基于合作式的网站资源采集系统的建设[J].情报杂志,2011,30(6):178-181. 被引量：3
5杨眉.网页更新预测算法研究现状[J].软件导刊,2013,20(4):57-59. 被引量：2
6张策,都云程,梁然.采用URL特征的Hub网页识别方法研究[J].现代图书情报技术,2016(1):24-31. 被引量：2
7杨长虹.新型网页水印变化检测技术[J].益阳职业技术学院学报,2017,0(Z1):82-84.

1冯薇,唐亚莉,阮永兰,徐星,张婷,张梦.奥马哈系统在门诊妊娠期高血压病人护理应用中的可行性探讨[J].全科护理,2017,15(27):3424-3426. 被引量：7
2崔镇涛.数据挖掘技术在Web预取中的应用研究[J].数码世界,2017,0(11):381-382. 被引量：2
3周宇.论计算机技术在高中物理实验教学中的应用[J].好家长,2017,0(26):232-232. 被引量：1
4程龙泉.基于预测模型和缓存替换策略的网络资源访问研究[J].科技通报,2017,33(10):134-137. 被引量：3
5崔杰,左海风,仲红.对轻量级分组密码MIBS和I-PRESENT的非对称Biclique攻击[J].中国科学：信息科学,2017,47(10):1395-1410. 被引量：1
6周佳林,刘艳.基于多时段IL用户优化调度的配电网TSC均衡提升策略[J].电力系统保护与控制,2017,45(23):1-8. 被引量：5
7向红,翟有龙,王丹丹.以数学之石,攻地理之玉[J].高考,2017,0(24):161-162.
8赵蓓,薛姗,吴日切夫,常玲.运营商互联网+典型业务安全威胁防范研究[J].电信工程技术与标准化,2018,31(1):79-83.
9姜福兴,曲效成,王颜亮,魏全德,刘军,赵庆民,刘维信.基于云计算的煤矿冲击地压监控预警技术研究[J].煤炭科学技术,2018,46(1):199-206. 被引量：42
10吴佼姣.北欧美与古典美——以阿恩·雅各布森为例[J].西部皮革,2017,39(20):94-95. 被引量：1

南京大学学报（自然科学版）

2018年第1期

浏览历史

内容加载中请稍等...

UCM-PPM:基于用户分级的多参量Web预测模型

参考文献1

二级参考文献17

共引文献7

相关作者

相关机构

相关主题

浏览历史