期刊文献+

基于文本最小相似度的中心选取方法 被引量:3

An Approach to Center Selection Based on Minimal Similarity Among Texts
下载PDF
导出
摘要 基于划分的聚类算法是一种局部最优算法。聚类初始中心的选择对该聚类算法的收敛速度和聚类的性能都有很大的影响。初始中心点应该选择来自不同的类并且初始中心点文本之间的相似度应尽量小,为此提出了一种新的基于最小相似度的中心选取方法,该方法选择相似度最小的两个样本分别作为初始的两个中心,然后依次选择到已知中心相似度最小的样本作为其他类的中心。实验表明该方法能够选出来自不同类的样本作为聚类的初始中心,同其他初始中心选择方法比较,聚类算法的性能有明显提高。 Clustering algorithms based on partition is a local optimization algorithm. The .selection of starting center points of clustering has great effects on the constringency speed of this clustering algorithms and the performance of clustering. The starting center points should select from different classes and the similarity of starting center points text should be as small as possible. So,this paper proposes a new method based on the center of sum function of the minimal similarity. The method selects two samples which have the minimal similarity as two centers of initial clustering,then selects the sample which has the minimal similarity to the selected points as other classes' center. Experiments show this method can select the samples from different classes as the starting point of clustering, compared with other methods ,the performance of clustering algorithms is obviously improved.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2008年第3期198-201,共4页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家863计划基金资助项目(2006AA01Z148) 教育部科学技术研究重点项目(207148)
关键词 K—Means 最小相似度 文本聚类 K-means minimal similarity text clustering
  • 相关文献

参考文献8

  • 1FANG Y C,PARTHASARATHY S,SCHWARTZ F. Using clustering to boost text classification[C]//Proceedings of the IEEE ICDM Workshop on Text Mining. [S. l. ] :[s. n. ], 2002 : 58-68.
  • 2RAUBER A,FRUHWIRTH M. Auomatically analyzing and organizing music arehives[C]//Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries: LNCS Vol 2163. Berlin:Springer, 2001 : 402-414.
  • 3东波.聚类/分类理论研究及其在文本挖掘中的应用[D].北京:中国科学院计算技术研究所,2000.
  • 4季铎,朱靖波.基于词分布的初始点选取方法[c]//中文信息处理前沿进展--中国中文信息学会二十五周年学术会议论文集.北京:清华大学出版社,2006:315-321.
  • 5CHRISTOPHER D M,HINRICHS.统计自然语言处理基础[M].苑春法,李庆中,李伟,等译.北京:电子工业出版社,2005:335-338.
  • 6STEINBACH M,KARYPIS G,KUMAR V. A comparison of document clustering techniques[C]//KDD 2000 Workshop on Text Mining. New York : ACM Press, 2000:109-110.
  • 7ZHAO Ying, KARYPIS G. Criterion functions for document clustering experiments and analysis [R]. Minneapolis, MN :Department of Computer Seience ,University of Minnesota,2001.
  • 8ZHAO Ying,KARYPIS G. Evaluation of hierarchical clustering algorithms for document dataset[C]//Proceedings of the Eleventh International Conference on Information and Knowledge Management. New York:ACM Press, 2002: 515-524.

共引文献1

同被引文献42

引证文献3

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部