摘要
一、引言
随着Internet的飞速发展,人们能从网上得到更多的信息,但过多的信息常常会导致信息迷失.将信息进行分类是帮助信息利用的有效方法,聚类则是文本类别划分时常用的技术,其特点是不需训练集即可从给定的文本集合中找到聚类划分[1~5].
Large-scale text processing becomes a great challenge as the fast growing of Internet and information explosion. Clustering is an effective method to solve this problem. An incremental algorithm called Mulit-Level CFK-means methods for large-scale text clustering is presented in this paper. More cluster information can be reserved and utilized by using the clustering features (CF) structure in this algorithm. Clustering results can be achieved very fast in one scan of the data. The computing and file exchange time of the algorithm is several times less than k-means algorithm and the accuracy of the results is almost equal to k-means algorithm. The effectiveness of the algorithm is proved by the contrastive experiment on Reuters text sets.
出处
《计算机科学》
CSCD
北大核心
2002年第9期13-15,共3页
Computer Science
关键词
信息处理
聚类特性
大规模文本聚类算法
计算机
Clustering features(CF),Multi-level CFK-means algorithm ,Text clustering