摘要
在WUM(Web Usage Mining)中挖掘序列模式的背景下,提出了一种基于server session约束的序列模式增长挖掘算法.首先,为了更好地从网站服务器日志文件中挖掘模式和发现知识,提出了一种基于server session的服务器日志文件格式.同时,引入基于server session的约束概念,利用其能够减少初始序列模式和候选项集大小的特点来减少每次扫描后缀数据库的规模,再从预处理后的日志文件中挖掘WUM的频繁访问路径的序列模式.最后通过实验证明了算法的有效性和优越性.
In the context of the sequence pattern mining in WUM, a server session constraintbased serial pattern growth mining algorithm is proposed. Firstly, to mine pattern and discover knowledge better from the log file, a server session-based server log file format is proposed. Then, by introducing server session-based constraint concept, which can reduce the initial sequence model and candidate set size, relying on that, the size of the suffix database scanned can be reduced each time. And then the serial pattern of the frequent access path in WUM can be mined. Finally, the validity and superiority of the presented algorithm are demonstrated by two experiments.
出处
《郑州大学学报(理学版)》
CAS
北大核心
2010年第1期24-28,共5页
Journal of Zhengzhou University:Natural Science Edition
基金
国家自然科学基金资助项目
编号60763012
广西科学研究与技术开发计划重大项目
编号0815007-1-15
广西研究生创新计划项目
编号2009106030774M03