摘要
传统的基于空间向量的文本谱聚类方法容易忽略文本上下文之间的语义联系,通过图结构进行文本表示可以很好的解决这一问题,在此基础上,本文提出了基于最大公共子图的谱聚类算法——SC-MCS算法。该算法通过求解文本之间的最大公共子图来进行文本相似度的计算,最后进行文本聚类。实验结果表明,与传统的基于空间向量的文本谱聚类方法相比,该算法在准确率和召回率都取得了一定的提升。
When using the traditional text spectral clustering method based on vector space,the context semantic relations are easily ignored. But the problem can be solved by representing text through the graph structure,on the basis of which,a spectral clustering algorithm based on the maximum common subgraph was proposed( hereafter called SC-MCS). The algorithm calculates text similarity by solving the maximum common subgraph of texts.The experimental results show that compared with the traditional text spectral clustering method based on vector space,the algorithm has improved accuracy and recall rate.
作者
冯仁群山
陈笑蓉
FENG Renqunshan;CHEN Xiaorong(College of Computer Science and Technology, Guizhou University, Guiyang 550025, Chin)
出处
《贵州大学学报(自然科学版)》
2018年第2期82-87,共6页
Journal of Guizhou University:Natural Sciences
基金
国家自然科学基金项目资助(61363028)
关键词
文本聚类
谱聚类
最大公共子图
text clustering
spectral clustering
maximum common subgraph