Web文档聚类中k-means算法的一种改进算法被引量：1

An Improved K-means Algorithm for Web Document Clustering

下载PDF

导出

摘要文章介绍了Web文档聚类中普遍使用的基于分割的k-means算法,分析了k-means算法所使用的向量空间模型和基于距离的相似性度量的局限性,从而提出了一种改善向量空间模型以及相似性度量的方法。实验表明,改进后的k-means算法不仅保留了原k-means算法效率高的优点,而且具有更高的准确性。 This paper introduced the popular partitioning-based k-means algorithm for Web document clustering,and analyzed the limitations of the VSM that k-means algorithm uses and the distance-based similarity computing.An improved algorithm was presented in this paper to solve these limitations,and experiments showed it is more precise than the k-means algorithm.

作者王子兴冯志勇

机构地区天津大学计算机科学与技术系

出处《微型电脑应用》 2007年第8期6-8,4,共3页 Microcomputer Applications

关键词文档聚类 k—means算法向量空间模型相似性度量权重评价函数 Document clustering K-means algorithm VSM Similarity computing Weighting value function

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献6

1C.Fraley and A.E.Raftery.How Many Clusters? Which Clustering Method? Answers Via Model-based Cluster Analysis[R].Technical Report No.329,1998.
2Ravi Kothari,Dax Pitts.On finding the number of clusters[J].Pattern Recognition Letters 20(1999)405-416.
3M.F Jiang,S.S.Tseng,C.M.Su.Two-phase clustering process for outliers detection[J].Pattern Recognition Letters 22(2001)691-700.
4RS.Bradley,Usama Fayyad,and Cory Reina.Scaling Clustering Algorithms to Large Databases[R].Microsoft Research Report,1998.
5Pierre Michaud.Clustering techniques[J].Future Generation Computer System 13(1997)135-147.
6lwayama Makoto,Tokunaga Takenobu.Hierarchical Bayesian clustering for automatic text classification[Z].TR95-0015.1995.

同被引文献3

1周涓,熊忠阳,张玉芳,任芳.基于最大最小距离法的多中心聚类算法[J].计算机应用,2006,26(6):1425-1427. 被引量：72
2姚清耘,刘功申,李翔.基于向量空间模型的文本聚类算法[J].计算机工程,2008,34(18):39-41. 被引量：50
3王继成,潘金贵,张福炎.Web文本挖掘技术研究[J].计算机研究与发展,2000,37(5):513-520. 被引量：275

引证文献1

1许伟佳.基于向量空间模型的文档聚类研究[J].电脑知识与技术,2009,5(9):7281-7283. 被引量：3

二级引证文献3

1肖慧,王立华,徐硕,陈孟婕.渔业科学数据智能RSS阅读器的设计研究[J].中国农学通报,2013,29(32):95-99. 被引量：1
2唐晓波,肖璐.基于依存句法分析的微博主题挖掘模型研究[J].情报科学,2015,33(9):61-65. 被引量：14
3陶惠,张妍,郝光权.基于向量空间的文档聚类算法分析[J].电脑知识与技术（过刊）,2011,17(7X):4781-4782. 被引量：2

1许伟佳.基于向量空间模型的文档聚类研究[J].电脑知识与技术,2009,5(9):7281-7283. 被引量：3

微型电脑应用

2007年第8期

浏览历史

内容加载中请稍等...

Web文档聚类中k-means算法的一种改进算法被引量：1

参考文献6

同被引文献3

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

Web文档聚类中k-means算法的一种改进算法 被引量：1

参考文献6

同被引文献3

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

Web文档聚类中k-means算法的一种改进算法被引量：1