摘要
针对Web日志挖掘中的会话识别问题,分别对Timeout方法、参引长度法进行改进,提出了一种改进的会话识别方法。该方法运用网站的拓扑结构信息,动态设定各页面的时间间隔阀值,使页面时间间隔阀值同页面的重要程度结合起来。同时通过灵活界定内容页,并针对内容页,提出了一些启发式规则,突破了"参引长度法"所固有的一个会话中只包含一个内容页的瓶颈。该方法提高了会话识别的准确度,实验结果表明是有效的。
Aiming at the problem of the session analysis in web log mining, a new method is proposed to recognize the session analysis, which improved the method of Timeout as well as the method of reference-length. According to the new method, the time interval threshold is adjusted dynamically by using the site's topology information, then the time interval threshold and the importance of the page can be related with each other. It also redefined the content page nimbly, and some heuristic rules about the content page are proposed to break through the bottleneck that one session only contained one content page. The accuracy of session analysis is enhanced, and is proved to be effective.
出处
《计算机工程与设计》
CSCD
北大核心
2008年第22期5685-5687,5690,共4页
Computer Engineering and Design