摘要
聚类分析是Web日志挖掘系统的重要组件,聚类分析的质量决定挖掘结果的有效性。本文引入一种向量聚类方法,并针对原有方法的不足提出改进。首先分析用户事务求出用户事务的相似矩阵,通过分别计算用户事务相似度和用户浏览路径相似度,然后把两者平均得到不同用户事务之间的相似性系数,最后根据相似性系数方法得出聚类结果。这种算法考虑到了web用户访问的有序、连续、重复性,结果能够真正反映出用户的浏览兴趣。
Clustering analysis is an important module, the quality of it decides the validity of results for data mining. This paper in- troduces vector clustering method and makes an improvement for it. First, analyzing user transaction to find comparability matrix, then by the comparability of user transaction comparability and user's scanning path this method makes an average. This method considers the sequence, continuity and repetition of web user access, the result could reflect real interests of users.
出处
《微计算机信息》
2009年第21期184-185,121,共3页
Control & Automation
关键词
日志挖掘
用户事务
聚类
相似性系数
log mining
user transaction
clustering
comparability modulus