期刊文献+

基于网页特征的会话识别方法 被引量:1

Method of sessions' identification based on feature of web pages
下载PDF
导出
摘要 提出一种利用网页特征进行会话识别的方法。通过分析网页本身的特征,计算站点中所有网页的特征向量。根据这些特征向量,可以计算任意网页之间的相关程度。按照用户请求页面在日志中的时间顺序,可以得到日志中所有直接相邻的页面记录的关联程度曲线。通过设定一个阈值,在关联程度曲线中波动较大的位置形成会话边界。将关联程度大的页面分类到一个会话中,从而完成会话识别。 In this paper, a method of sessions' identification based on the feature of web pages is proposed. After the features of web pages are analysed, the feature vectors of all web pages in a website are computed. Based on the feature vectors, the relativity between any two web pages could be computed. According to the time sequence of user's request pages in the web log, a curve of relativity between any two direct neighbor web pages could be found. After a threshold is set up, sessions' border would be found at the position where the fluctuation are great in the curve of relativity. After the high relative web pages are put into one sesstion, sessions' identification is completed.
出处 《燕山大学学报》 CAS 2008年第1期10-13,共4页 Journal of Yanshan University
关键词 WEB日志挖掘 数据预处理 会话识别 web log mining data preprocessing sessions' identification
  • 相关文献

参考文献7

  • 1Federico Michele Facca, Pier Luca Lanzi. Mining interesting knowledge from weblogs: a survey [J]. Data and Knowledge Engineering, 2005,53 (3): 225-241.
  • 2Catledge L, Pitkow J. Characterizing browsing strategies in the world wide web [J]. Computer Networks and ISDN Systems, 1995,27 (6): 1065-1073.
  • 3Cooley R, Mobasher B, Srivastava J. Data preparation for mining world wide web browsing patterns [J]. Journal of Knowledge and Information Systems, 1999,1 (1): 5-32.
  • 4Chen M S, Park J S, Yu P S. Efficient data mining for path traversal patterns [J]. IEEE Transactions on Knowledge and Data Engineering, 1998,10 (2): 209-221.
  • 5江宝林,申展,张川,葛家翔,胡运发.结合网站内容和结构进行的Web日志挖掘[J].计算机工程,2004,30(16):30-32. 被引量:9
  • 6李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器[J].计算机学报,2001,24(1):62-68. 被引量:108
  • 7Salton D, Bukley C. Term-weighting approaches in automatic text retrieval [J]. Information Processing and Management, 1988,24 (5): 513-523.

二级参考文献6

  • 1Cooley R, Tan P N, Srivastava J. Websift:The Web Site Information Filter System. In Proceedings of the 1999 KDD Workshop on Web Mining, San Diego, CA: Springer-Verlag. 1999
  • 2Cooley R, Mobasher B, Srivastava J. Data Preparation for Mining World Wide Web Browsing Patterns. Journal of Knowledge and Information systems, 1999, 1(1): 5-32
  • 3Salton G, Buckley C. Term-weighting Approaches in Automatic Text Retrieval. Information Processing &Management, 1988,24(5): 513-523
  • 4Selim S Z, Ismail M A. K-means-type Algorithms: A Generalized Convergence Teheorem and Characterization of Local Optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984,(1):81
  • 5李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器[J].计算机学报,2001,24(1):62-68. 被引量:108
  • 6李晓黎,史忠植.用数据采掘方法获取汉语词性标注规则[J].计算机研究与发展,2000,37(12):1409-1414. 被引量:10

共引文献115

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部