摘要
针对常规技术对短文本聚类时出现的相似度计算准确度较差、聚类结果不稳定等问题,提出一种以HowNet语义词库和BTM主题建模为基础的相似度计算方法,将两者进行线性组合,综合考察短文本的相似性。建立基于聚类质量和聚类差异度的聚类结果评价指标,进行优劣评价,过滤出质量较好的结果,利用CSPA融合算法进行聚类融合。实验结果表明,该方法提高了短文本相似度计算的准确性,改进了融合结果稳定性。
Using conventional techniques in the short-text clustering, the similarity calculation accuracy is poor, and the clustering result is unstable. A similarity calculation method based on HowNet semantics thesaurus and BTM topic model was proposed Both of them were linearly combined and short text similarity was comprehensively studied Evaluation based on the clustering quality and clustering degree of difference was established, it was used to evaluate advantages and disadvantages. Better quality results were filtered out, clus-tering integration was realized using CSPA fusion algorithm. Experimental results show that the proposed method improves the accuracy of the calculation of short-text similarity, and improves the stability of the fusion results.
作者
阳小兰
杨威
钱程
朱福喜
YANG Xiao-lan YANG Wei QIAN Cheng ZHU Fu-xi(School of Information and Engineering, Wuchang University of Technology, Wuhan 430223, China Computer School, Wuhan University, Wuhan 430072 China)
出处
《计算机工程与设计》
北大核心
2017年第5期1258-1263,共6页
Computer Engineering and Design
基金
湖北省自然科学基金项目(2014CFB356)