
基于用户兴趣分析的网页生命周期建模 被引量:5

Modeling Lifetime of Web Pages Based on User Interest Analysis
摘要 网页在其生命周期内的活跃程度会随时间发生变化。有的网页只在特定的阶段有价值,此后就会过时。从用户的角度对网页的生命周期进行分析可以提高网络爬虫和搜索引擎的性能,改善网络广告的效果。利用一台代理服务器收集的网页访问量信息,我们对网页的生命周期进行了研究,给出了用户兴趣演变的模型。这个模型有助于更好地理解网络的组织与运行机理。 The activeness of a web page varies during its lifetime. Some pages are valuable only in a specific period, and then become obsolescent. Web page lifetime analysis from users' perspective is important to enhance the performance of web crawlers and search engines, and to improve the efficiency of web advertising. With page view data collected by a proxy server, we were able to perform large scale analysis in web page lifetime. A model is given to describe user interest evolution based on an experiment conducted with the page view data of more than 36 000 000 web pages for two months. The model is the foundation to better understand how the web is organized and operates.
出处 《中文信息学报》 CSCD 北大核心 2008年第2期76-80,共5页 Journal of Chinese Information Processing
基金 国家973重点基础研究资助项目(2004CB318108) 国家自然科学基金资助项目(60621062,60503064,60736044) 国家863高科技计划资助项目(2006AA01Z141)
关键词 计算机应用 中文信息处理 用户行为分析 网页生命周期 网络日志挖掘 computer application Chinese information processing user behavior analysis web page lifetime weblog mining
  • 相关文献


  • 1Brewington B, Cybenko G. How Dynamic is the Web[A]. In: Proceedings of WWW9-9th International World Wide Web Conference (IW3C2) [C]. 264-296.
  • 2Page L, Brin S, Motwain, R., and Winograd T. The Pagerank Citation Algorithm: Bringing Order to the Web[A]. In:7th World Wide Web Conference[C]. 1998.
  • 3Kleinberg J. Authoritative sources in a hyperlinked environment[A]. In: Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA) [C]. 668. 1998.
  • 4Dhyani D, Wee K, and Bhowmick S. A Survey of Web Metrics[A]. ACM Computing Surveys [C]. 2002.
  • 5Edwards J, McCurley K, Tomlin J. An Adaptive Model for Optimizing Performance of an Incremental Web Crawler[A]. In: Proceedings of the 10th international conference on World Wide Web[C].
  • 6Dalal Z, Dash S, Dave P, et al. Managing Distributed Collections: Evaluating Web Page Changes[A]. Movement, and Replacement, 2004 Joint ACM/IEEE Conference [C]. 2004.
  • 7Ashman H. Electronic Document Addressing:Dealing with Change[A]. ACM Computing Surveys[C]. 2000.
  • 8Gomes D. Modelling Information Persistence on the Web[A]. In: Proceedings of the 6th international conference on Web engineering[C].
  • 9Fetterly D, Manasse M, Najork M, Wiener JL. A Large-Scale Study of the Evolution of Web Pages[A]. Proceedings of WWW03[C]. 669-678.
  • 10Cho J, Garcia molina H. Effective Page Refresh Policies for Web Crawlers[A]. ACM Transactions on Database Systems[C].


  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:197
  • 2杨彬,康慕宁.基于概念的权重PageRank改进算法[J].情报杂志,2006,25(11):70-72. 被引量:10
  • 3卢代军,夏学知,张子鹤,沙基昌.目标信息的时效性分析[J].火力与指挥控制,2007,32(1):38-41. 被引量:13
  • 4高钢,彭兰.三极力量作用下的网络新闻传播——中国网络媒体结构特征研究[J].国际新闻界,2007,29(6):57-62. 被引量:28
  • 5Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web [ D ]. California, USA: Stanford University, 1998.
  • 6Page L, Brin S, Motwani R, et al. The Anatomy of a Large - Scale Hypertextual Web Search Engine [ D ]. California, USA: Stanford University, 1998.
  • 7中国互联网络信息中心(CNNIC).第27次中国互联网络发展状况统计报告[R],北京:中国互联网络信息中心,2011,.
  • 8百度指数帮助[DB/OL].[2011-01-10].http://www.baidu.com/search/index_help.html.
  • 9Zhongming Ma, Gautam Pant,Olivia R Liu Sheng. Interest- based personalized search [ J ]. ACM Transactions on Information Systems (TOIS) ,2007,25 ( 1 ) : 1-38.
  • 10Gauch S, Speretta M, Chandramouli A, et al. User Profiles for Personalized Information Access [ C ]//The Adaptive Web: Methods and Strategics of Web Personalization, 2007 : 54-89.










使用帮助 返回顶部