摘要
本文提出了利用后缀树模抽的最大相似度优先聚类方法,通过构造文档集的广义后缀树模型抽取短语作为特征项并映射到M维向量空间模型;计算文档间的相似度矩阵,对任意两个文档之间的相似度进行降序排列,优先合并具备最大相似度的文档对形成初始聚类;合并初始聚类得到最终聚类结果。
A novel clustering method called Maximum Similarity Priority Clustering based on generalized suffix tree is proposed.Each phrase extracted from generalized suffix tree of documents collection is regarded as a unique feature term in vector space model.Similarities matrix is computed and the similarities are sorted in descend order.Then,according to maximum similarity priority,documents pairs are merged into initial clusters which can be merged into final clusters.
出处
《中国科技信息》
2013年第3期89-91,共3页
China Science and Technology Information
基金
重庆市科委(编号cstc2012gg-yyjsB40006)
关键词
聚类方法
后缀树
最大相似度
向量空间模型
clustering algorithms
suffix tree
maximum similarity
vector space model