期刊文献+

Web日志会话的个性化识别方法的研究 被引量:7

Research on method for session identification in Web log mining
下载PDF
导出
摘要 会话识别是Web日志挖掘中的重要步骤。针对目前的各种会话识别方法,提出了一种改进的基于页面内容、下载时间等多个参数综合得到的针对每个用户的个性化识别方法。该方法通过使用访问时间间隔,判断是否在极大、极小两个阈值范围内来识别会话。根据页面内容、站点结构确定页面重要程度,通过页面的信息容量确定用户正常的阅读时间,通过Web日志中页面下载时间来确定起始阅读时间,对以上因素进行综合后对该阈值进行调整。实验结果表明,相对于目前的对所有用户页面使用单一先验阈值进行会话识别的方法及使用针对用户页面的阈值动态调整方法,提出的方法能更准确地个性化确定出页面访问时间阈值,更为合理有效。 Session identification is an important step in Web log mining.Compared with the traditional static threshold methods,multi-parameters based dynamic threshold improvement is carried out.Its parameters contain the content of Web page,downloading time,etc,and it produces an individual threshold for different user.In this improvement,the Web log is divided into session at point where the access interval is between maximum threshold and minimal threshold.The threshold is adjusted by the page weightness based on site’s structure,normal read speed based on page contents and begin read time based on download time for different users.Compared with the traditional method that defines an uniform threshold for all Web pages and other methods that define different threshold for each Web page,experimentally,the approach presented can decide the access time threshold more accurately.It is more reasonable and effective.
出处 《计算机工程与应用》 CSCD 北大核心 2008年第8期179-182,共4页 Computer Engineering and Applications
基金 山西省自然科学基金( the Natural Science Foundation of Shanxi Province of China under Grant No.2006011030, No.2007011050)
关键词 WEB挖掘 会话识别 预处理 阈值 Web mining session data preprocessing threshold
  • 相关文献

参考文献10

  • 1韩家炜,孟小峰,王静,李盛恩.Web挖掘研究[J].计算机研究与发展,2001,38(4):405-414. 被引量:356
  • 2Yang Qiang,Zhang Hai-ning,Li Tian-yi.Mining Web logs for prediction models in WWW caching and perfecting[C]//The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD'01.San Francisco:ACM SIGKDD,2001.
  • 3Mikroyannidis A,Theodoulidis B.A theoretical framework and an implementation architecture for self adaptive Web sites [C]//Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence ( WI' 04).Beijing: IEEE Press, 2004.
  • 4Berendt B,Mobasher B,Nakagawa M,et al.The impact of site structure and user environment on session reconstruction in Web usage analysis[C]//Proceedings of the 4th WebKDD 2002 Workshop at the ACM2SIGKDD Conference on Knowledge Discovery in Database.Edmonton,Alberta:ACM SIGKDD,2002.
  • 5Spiliopoulou M,Mobasher B,Berendt B,et al.A framework for the evaluation of session reconstruction heuristics in Web usage analysis[J].Informs Journal of Computing,Special Issue on Mining Web Based Data for E-Business Applications,2003,15(2): 171-190.
  • 6Chen M S,Park J S,Yu P S.Data mining for path traversal patterns in a Web environment[C]//Proceedings of the 16th International Conference on Distributed Compute System.Hong Kong: IEEE Press, 1996:385-392.
  • 7Zaiane O R,Xin M,Han J.Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs[C]//Proc Advances in Digital Libraries Conf,ADL'98.Santa Barbara, CA : IEEE Press, 1998 : 19-29.
  • 8殷贤亮,张为.Web使用挖掘中的一种改进的会话识别方法[J].华中科技大学学报(自然科学版),2006,34(7):33-35. 被引量:27
  • 9He D,Goker &Detecting session boundaries from Web user logs[C]// Proceedings of the 22nd Annual Colloquium of IR Research, Cambridge, UK, 2000 : 57-66.
  • 10Hallam-Baker P M,Behlendorf B.Extended log file format[EB/OL]. http : //www. w3.org/TR/WD-logfile-960221 .html.

二级参考文献12

  • 1Han J,Data Mining:Concepts and Techniques,2000年
  • 2Wang K,Proc of VLDB'97,1999年,363页
  • 3Zaiane O R,Proc Int Workshop Web Information and Data Management(WIDM'98),1998年,9页
  • 4Mobasher B,Tech Rep:TR96 0 5 0,1996年
  • 5Zaiane O R,Proc KDD'95,1995年,331页
  • 6Yang Qiang, Zhang Haining, Li Tianyi. Mining Web logs for prediction models in WWW caching and prefecting[C]//The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD'01. San Francisco: ACM SIGKDD, 2001.
  • 7Mikroyannidis A, Theodoulidis B. A theoretical framework and an implementation architecture for self adaptive Web sites[C]// Prodeedings of the IEEE/WIC/ACM International Conference on Web Intelligence(WI'04), Beijing: IEEE Press, 2004.
  • 8Berendt B, Mobasher B, Nakagawa M, et al. The impact of site structure and user environment on session reconstruction in Web usage analysis[C]// Proceedings of the 4th WebKDD 2002 Workshop at the ACM-SIGKDD Conference on Knowledge Discovery in Database. Edmonton, Alberta: ACM SIGKDD,2002.
  • 9Spiliopoulou M, Mobasher B, Berendt B, et al. A framework for the evaluation of session reconstruction heuristics in Web usage analysis[J]. Informs Journal of Computing, Special Issue on Mining Web-Based Data for E-Business Applications, 2003, 15(2): 171-190
  • 10Chen M S, Park J S, Yu P S. Data mining for path traversal patterns in a Web environment [C]// Proceedings of the 16th International Conference on Distributed Compute System. Hong Kong:IEEE Press,1996: 385-392.

共引文献373

同被引文献48

引证文献7

二级引证文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部