摘要
根据搜索引擎的用户查询日志库信息对用户访问模式聚类算法进行了研究,说明了用雅可比系数及加权相似性度量公式实现用户访问模式聚类的不足,提出了一种改进的Hamming距离公式,运用距离测度法实现用户访问模式聚类,给出了聚类算法。对算法的分析表明,基于偶图和改进Hamming距离公式的算法是准确和高效的。
A user access pattern clustering algorithm is researched according to search engine query log. It is explained that Jaccard coefficient and weighted similarity computation are not suitable for user access pattern clustering, A kind of improved Hamming distance computation formula is put forward; the clustering algorithm that uses Hamming distance to measure the similarity is given. After analyzing the algorithm, the result is concluded that this algorithm based on bipartite graph and improved Hamming distance computation formula is exact and efficient.
出处
《计算机工程》
EI
CAS
CSCD
北大核心
2006年第20期38-39,74,共3页
Computer Engineering