期刊文献+

一种用于Web搜索的高效聚类算法 被引量:3

Efficient Clustering Algorithm Used for Web Search
下载PDF
导出
摘要 根据搜索引擎的用户查询日志库信息对用户访问模式聚类算法进行了研究,说明了用雅可比系数及加权相似性度量公式实现用户访问模式聚类的不足,提出了一种改进的Hamming距离公式,运用距离测度法实现用户访问模式聚类,给出了聚类算法。对算法的分析表明,基于偶图和改进Hamming距离公式的算法是准确和高效的。 A user access pattern clustering algorithm is researched according to search engine query log. It is explained that Jaccard coefficient and weighted similarity computation are not suitable for user access pattern clustering, A kind of improved Hamming distance computation formula is put forward; the clustering algorithm that uses Hamming distance to measure the similarity is given. After analyzing the algorithm, the result is concluded that this algorithm based on bipartite graph and improved Hamming distance computation formula is exact and efficient.
出处 《计算机工程》 EI CAS CSCD 北大核心 2006年第20期38-39,74,共3页 Computer Engineering
关键词 聚类 HAMMING距离 搜索引擎 Clustering Hamming distance Search engine
  • 相关文献

参考文献7

二级参考文献79

  • 1刘静,钟伟才,刘芳,焦李成.免疫进化聚类算法[J].电子学报,2001,29(z1):1868-1872. 被引量:43
  • 2钱云涛,谢维信.一种由模糊逻辑神经元网络实现的聚类分析方法[J].西安电子科技大学学报,1995,22(1):1-7. 被引量:12
  • 3Barbara D, Chen P. Using the fractal dimension to cluster datasets [A]. Proceedings of the 6th ACM SIGKDD [C]. Boston, MA., 2000, 260-264.
  • 4Kandogan E. Visualizing multi-dimensional clusters, trends and outliers using star coordinates [A]. Proceedings of the 7th ACM SIGKDD [C]. San Francisco, CA., 2001, 107-116.
  • 5Bezdek J C. Pattern Recognition With Fuzzy Objective Function Algorithms [M]. New York: Plenums Press, 1981, 95-107.
  • 6Pal N R, Bezdek J C. On Cluster Validity for the Fuzzy C-Means Model [J]. IEEE Trans on Fuzzy System, 1995, 3(3): 370-379.
  • 7Engleman L, Hartigan J. Percentage points of a test for clusters [J]. Journal of the American Statistical Association, 1969, 64: 1647-1648.
  • 8Millgan G, Cooper M. An examination of procedures for determining the number of clusters in a data set [J]. Psychometrika, 1985, 50: 159-179.
  • 9史忠植 刘少辉 郑毅 傅伟鹏 吴斌.一种基于群体智能的Web文档聚类算法[J].计算机研究与发展,2003,39(11).
  • 10Knorr E, Ng R. Algorithms for mining distance-based outliers in large datasets [A]. Proceedings of the 24h Conference on VLDB [C]. New York, 1998, 392-403.

共引文献158

同被引文献20

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部