摘要
针对常见信息检索技术的缺陷 ,提出一种基于相似性的文档聚类分析算法 .将文档集合转化为向量集合 ,基于向量之间的余弦相似度 ,采取凝聚的层次聚类算法来获得聚类 .给出了算法的详细描述和一个测试实例 .
This paper proposed the algorithm for a document clustering based on similarity to overcome the drawbacks of ordinary information search. A set of documents was converted into a set of vectors. Based on cosine similarity between vectors clusters were obtained by adopting the algorithm for agglomerative hierarchical clustering. The algorithm was described and a test example was offered.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2002年第12期59-61,共3页
Journal of Huazhong University of Science and Technology(Natural Science Edition)