摘要
访问模式是用户沿URL超链寻找和浏览网页规律的总结 ,发现用户访问模式对于帮助用户快速到达目标页面 ,进而实现搜索引擎的个性化导航具有重要意义 目前虽有一些挖掘用户访问模式的工作 ,但尚未发现能够处理增量数据的系统化挖掘算法 用户访问模式挖掘可由如下 3个步骤完成 :①由日志库提取最大向前关联路径 ,②由最大向前关联路径发现频繁关联路径序列 ,③由频繁关联路径序列得到最大频繁关联路径序列 ,其中②是问题的核心 为得到系统化算法 ,对概念格模型加以顺序约束 ,提出了有序概念格 ,并将其用于Web访问模式的增量发掘 给出了增量式高效挖掘算法 ,并与相关工作进行了比较 。
Traversal patterns reflect the regularities when web users browse and select web pages along URL hyper links The discovery of traversal patterns is useful for search engines to personalize their navigation by guiding web surfers to reach their target web pages rapidly Although some work has been conducted for mining user traversal patterns from server logs,no incremental algorithm has been reported The mining of user traversal patterns can be accomplished in three steps: ① extracting maximal forward reference paths from server log; ② discovering frequent reference paths based on the result of the first step; and ③ filtering to get maximal frequent reference paths from the output of the second step The second step constitutes the core of the whole mining process To take a systematic approach, sequential restrictions are added to the concept lattice (Galois lattice), and an ordered concept lattice is defined The proposed ordered concept lattice is then used in discovering frequent user traversal patterns An incremental and efficient mining algorithm is put forward,and a comparison with related works is also conducted Experimental results on both synthetic and real life data sets demonstrate the effectiveness of the novel algorithm
出处
《计算机研究与发展》
EI
CSCD
北大核心
2003年第5期675-683,共9页
Journal of Computer Research and Development
基金
国家自然科学基金 (69673 0 15 )
吉林省科技发展计划项目基金 (2 0 0 0 0 111)