摘要
提出一种新的基于时间阈值会话识别算法,在时间阈值的计算上,既考虑了站点页面内容和结构的差异性,同时也考虑了访问者的个体差异性。相对于所有用户使用单一先验阈值和使用统计方法结合页面内容确定阈值的方法,方法能更准确地确定页面访问时间阈值,进行会话识别时具有更高的效率和真实性。
This paper presents a new kind of session identification algorithm based on time threshold.When calculating the time threshold value,we have considered the difference of the content and the structure of the website pages,the individual difference of visitors are also considered simultaneously.Contrasting to traditional methods that define the uniform threshold for all users with the priori threshold and with the statistical method in conjunction with page contents,the approach presented in this paper can determine the webpage access time threshold more accurately.It has a higher efficiency and reality when identifying sessions.
出处
《计算机应用与软件》
CSCD
2010年第6期92-94,共3页
Computer Applications and Software
基金
浙江省教育厅科研计划基金项目(200070733)
关键词
WEB日志挖掘
会话识别
阈值
数据预处理
Web log mining Session identification Threshold Data pre-processing