期刊文献+

中文Web检索中聚类算法的改进 被引量:9

Improvement of clustering algorithm in chinese web retrieval
下载PDF
导出
摘要 对基于混合相似度的HTFC算法进行改进,要做的预处理是:建立向量空间模型,计算文档和链接的混合相似度。算法过程是:首先随机选取√kn个文档进行层次聚类,直到剩k个聚簇为止;对这k个聚簇不断迭代直到集合元素不再变化为止;然后表示出每类;最后通过用户对结果的反馈使得新生成的簇继续迭代,最终满足用户需求。算法第1步采用的是改进的k-means算法,可提高运行效率。反馈机制对原有模型进一步修正,从而提高精度。 Improvement of HTFC algorithm based on mixed similarity is engaged. Pre-processes to be done are: building up vector space model, computing mixed similarity according to text and hyperlink. Procedure of algorithm is: firstly choose 4 kn texts at random, agglomerative clustering is executed until the number of clusters is left k, secondly iteration is repeated until elements in the set keep stability; then show each class; lastly the feedback to result can iterate again to stabilize newly cluster. By adoption of improved k-means algorithm, performance can be enhanced. The improvement of feedback to prototype can also upgrade precision.
出处 《计算机工程与设计》 CSCD 北大核心 2005年第10期2685-2687,共3页 Computer Engineering and Design
基金 上海市教育委员会科研基金项目(04EB12)
关键词 文本聚类算法 信息检索 WEB挖掘 text clustering algorithm information retrieval web mining
  • 相关文献

参考文献9

  • 1Ronen Feldman,Ido Dagan. KDT-Knowledge discovery in textual databases[R].Montereal:In Proceedings of the 1st Annual Conference on Knowledge Discovery and Data Mining, 1995.112-117.
  • 2Willet P. Recent trends in hierarchical document clustering: A critical review [J]. Information Processing and Management,1988, 24(5).
  • 3Rocchio J J.Document retrieval systems-optimization and evaluation[D].Harvard University: Ph.D.Thesis in Computer Science,1966.
  • 4Cheeseman P, Kelly J, SelfM.AutoClass:A bayesian classification system[C]. Proceedings of the Fifth International Confe-rence on Machine Learning(ML'88), 1988.54-64.
  • 5Hill D R. A vector clustering technique in samuelson(ed.)[M].Amsterdam:Mechanized Information Storage,Retrieval and Dissemination, 1968.
  • 6Cutting D R,Karger D R,Pedersen J O. A cluster-based approach to rrowsing large document collections [R]. SIGIR'92, 1992.318-329.
  • 7Berry M.W, Drmac Z,Jessup E.R.Matrices,vector spaces,and information retrieval[J].SIAM Review, 1999,41(2):335-362.
  • 8Bjomar Larsen,Chinatsu Aone. Fast and effective text mining using linear-time document clustering[R].San Diego California:KDD'99, 1999.16-22.
  • 9Buckles B Petal. Fuzzy clustering with genetic search[J].IEEE~FUZZ'94, 1994.

同被引文献65

引证文献9

二级引证文献62

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部