摘要
提出一个基于Web日志的Web用户群体和站点URL聚类算法.使用用户浏览行为描述和用户浏览时间离散化方法建立了Web站点的用户事务矩阵,并在此基础上对Web用户群体和站点URL进行聚类.由于在聚类过程中同时考虑了用户对URL的浏览时间和访问次数,使算法的精度和效率都大大提高.同时,该算法能较好地处理类间重叠问题,使算法具有较好的实用性.最后对算法的有效性和可伸缩性进行了研究.
By using new methods which are based on Web user's browsing behavior characterization and user's viewing time discretization, a new clustering algorithm for Web user communities and Web site's URLs is proposed. Web user access matrixes are set up on the preparation of Web logs. By considering user's viewing time and number of hits to Web site's URLs simultaneously, the accuracy and efficiency of the clustering algorithm are increased. The improved algorithm could solve the problem of the partial overlap bewteen clusters, which makes the algorithm more practical. The effectiveness and the sealability of the algorithm are studied through the experiments.
出处
《控制与决策》
EI
CSCD
北大核心
2007年第3期284-288,共5页
Control and Decision
基金
国家自然科学基金项目(60173058)