摘要
提出了一种文档聚类方法,对用户的检索结果中类似的文档进行聚类,提供目录结构,辅助用户浏览检索结果,首先分析了现有的文本聚类方法,讨论了它们的优势和不足,然后提出了基于后缀树的中文文本聚类算法,并详细描述了该算法的原理和构造使用过程,及在算法实现的过程中遇到的关键问题及解决方案。
This article proposes a document clustering method, which clusteres the result of the user' s search, gives the directory structure of those results and helpes the user to explore the results. The article first analyzes the classical text clustering algorithms, and points out their advantages and disadvantages. A suffix -tree based Chinese text clustering method is proposed and discusses the main idea and the construction of this algorithm. Then some problems of the realization are discussed and the corre- sponding solution is given.
出处
《上海师范大学学报(自然科学版)》
2006年第5期21-26,共6页
Journal of Shanghai Normal University(Natural Sciences)
关键词
后缀树
文本聚类
文本处理
suffix tree clustering
text clustering
text processing