期刊文献+

球型模糊c均值算法在中文文本聚类中的应用 被引量:8

Application of Spherical Fuzzy C-means Algorithm in Clustering Chinese Documents
下载PDF
导出
摘要 一般的聚类算法只能将给定的文本归到一个类,但实际的文本往往属于多个类。提出一种基于球形的模糊c-均值算法的中文文本聚类方法。聚类方法仅考虑文本向量的方向而不考虑文本向量的大小。同时,聚类方法能充分考虑文本隶属于类的程度,并能通过用户给定的阈值将给定的文本归到多个类。实验表明,球形的模糊c-均值算法不仅具有好的聚类精度,而且能找出属于多个类的文本。 A given document can only be partitioned into one class by the general clustering algorithms, but one document can fall into several classes in the practice. A clustering algorithm for Chinese documents based on the spherical fuzzy c-means algorithm is presented. This clustering algorithm considers the direction of document vectors, but it does not consider the size of the document vectors. At the same time, the degree to which documents belong to classes can be fully considered in this clustering algorithm, and a given document can be partitioned into several classes by a given user threshold. The experiment shows that the spherical fuzzy c-means algorithm not only has fine accuracy but also can find the documents that belong to several classes.
出处 《系统仿真学报》 CAS CSCD 2004年第3期516-518,共3页 Journal of System Simulation
基金 国家自然科学基金青年科学基金资助(60303024)
关键词 中文文本 球形的模糊c-均值算法 聚类 文本挖掘 Chinese documents spherical c-means algorithm clustering text mining
  • 相关文献

参考文献7

  • 1[1]Ci G C. Information retrieval systems - Theory and implementation [M]. Cluwer Academic Publishers, 1997.
  • 2[2]Zamir O, Etzioni O, Madani O, Richard M. Fast and intuitive clustering of Web documents [A]. CDD '97 [C]. 1997, 287-290.
  • 3[3]FraCes W B, Yates R B. Information retrieval data structures and algorithms [M]. Prentice Hall, 1992.
  • 4[4]Jain A C, Dubes R C. Algorithms for clustering data [M]. Prentice Hall, 1988.
  • 5[5]Inderjit S D, Dharmendra, S. M. Concept decompositions for large sparse text using clustering [J]. Machine Learning, 2001, 42(1): 143-175.
  • 6[6]Hathaway R J, Davenport J W, Bezdek J C. Relational dual of the c-means algorithms [J]. Pattern Recognition, 1989, 22(2): 205-212.
  • 7[7]Xianping Ge, Wanda Pratt, Padhraic Smyth. Discovering Chinese words from unsegmented text [A]. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. 1999. 271-272.

同被引文献83

引证文献8

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部