摘要
一般的聚类算法只能将给定的文本归到一个类,但实际的文本往往属于多个类。提出一种基于球形的模糊c-均值算法的中文文本聚类方法。聚类方法仅考虑文本向量的方向而不考虑文本向量的大小。同时,聚类方法能充分考虑文本隶属于类的程度,并能通过用户给定的阈值将给定的文本归到多个类。实验表明,球形的模糊c-均值算法不仅具有好的聚类精度,而且能找出属于多个类的文本。
A given document can only be partitioned into one class by the general clustering algorithms, but one document can fall into several classes in the practice. A clustering algorithm for Chinese documents based on the spherical fuzzy c-means algorithm is presented. This clustering algorithm considers the direction of document vectors, but it does not consider the size of the document vectors. At the same time, the degree to which documents belong to classes can be fully considered in this clustering algorithm, and a given document can be partitioned into several classes by a given user threshold. The experiment shows that the spherical fuzzy c-means algorithm not only has fine accuracy but also can find the documents that belong to several classes.
出处
《系统仿真学报》
CAS
CSCD
2004年第3期516-518,共3页
Journal of System Simulation
基金
国家自然科学基金青年科学基金资助(60303024)