摘要
对网络日志数据挖掘预处理技术进行研究,针对Frame页面过滤方法与超时阈值设定进行分析,提出了应用ID3算法改进Frame页面过滤过程中丢失SubFrame页面信息且需要进行站点提升步骤。在超时阈值的设定方面采用动态修正方法,提高预处理技术对长时间会话的识别能力的改进方法。通过实验验证,该方法有效地减少了预处理过程中的信息丢失,同时提高了挖掘结果的精度。
Data preprocessing method of Web log mining is studied. Frame pages filtering and overtime threshold value seting are analyzed. The improved method based on induction of decision tree(ID3) algorithm and threshold value dynamic amendment algorithm is proposed. This method deals with information loss by Frame pages filtering and threshold value fixing. Transaction session identification ability is also enchanced. The experiment about this method shows that this method is efficient in improving accuracy of mining result.
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2009年第12期2994-2997,共4页
Systems Engineering and Electronics
关键词
网络日志
数据挖掘
预处理
会话识别
Web log
data mining
preprocessing
transaction session identification