期刊文献+

结合语义改进的K-means短文本聚类算法 被引量:14

Improved K-means clustering algorithm combined semantic similarity of short text
下载PDF
导出
摘要 针对短文本聚类存在的三个主要挑战,特征关键词的稀疏性、高维空间处理的复杂性和簇的可理解性,提出了一种结合语义改进的K-means短文本聚类算法。该算法通过词语集合表示短文本,缓解了短文本特征关键词的稀疏性问题;通过挖掘短文本集的最大频繁词集获取初始聚类中心,有效克服了K-means聚类算法对初始聚类中心敏感的缺点,解决了簇的理解性问题;通过结合TF-IDF值的语义相似度计算文档之间的相似度,避免了高维空间的运算。实验结果表明,从语义角度出发实现的短文本聚类算法优于传统的短文本聚类算法。 Nowadays, there are three major challenges for short text clustering, the sparsity of feature key, the complexityof processing in high-dimensional space and the comprehensibility of clusters. For these challenges, a K-means clusteringalgorithm is proposed, which is improved by combining with semantic. Short text is described by collection of words inthis algorithm, it alleviates the sparsity problem of characteristics of short text keywords. The clustering center can beobtained by mining the maximum frequent word set of short text collection, which effectively overcomes the defect thatK-means clustering algorithm is sensitive to the initial clustering center, it solves the problem of the comprehensibility ofclusters, and avoids the operation in high-dimensional space. The experimental results show that short text clustering algorithmcombined with semantic is better than traditional algorithms.
作者 邱云飞 赵彬 林明明 王伟 QIU Yunfei;ZHAO Bin;LIN Mingming;WANG Wei(School of Software, Liaoning Technical University, Huludao, Liaoning 125105, China)
出处 《计算机工程与应用》 CSCD 北大核心 2016年第19期78-83,共6页 Computer Engineering and Applications
基金 国家自然科学基金(No.71371091) 辽宁省高等学校杰出青年学者成长计划(No.LJQ2012027) 辽宁省教育厅一般项目(No.L2013131)
关键词 文本挖掘 短文本聚类 K-MEANS算法 最大频繁词集 知网 语义相似度 text mining clustering of short text K-means algorithm maximum frequent word set HowNet semantic similarity
  • 相关文献

参考文献15

  • 1Guo Qinglin,Zhang Ming.Multi-documents automaticabstracting based on text clusteringand semantic analysis[J].Knowledge-Based Systems,2009,22(3):482-485.
  • 2Carretero-Campos C,Bernaola-Galvan P,Coronado A V.Improving statistical keyword detection in short texts:Entropic and clustering approaches[J].Physica A,2013,392(6):1481-1492.
  • 3Liu Wenyin,Quan Xiaojun,Feng Min.A short text modelingmethod combining semantic andstatistical information[J].Information Sciences,2010,180(20):4031-4041.
  • 4Cagnina L,Errecalde M,Ingaramo D.An ef ficient particleswarm optimization approach tocluster short texts[J].Information Sciences,2013,56(3):1-14.
  • 5贺涛,曹先彬,谭辉.基于免疫的中文网络短文本聚类算法[J].自动化学报,2009,35(7):896-902. 被引量:18
  • 6Feng Xinyuan,Wei Jianguo,Lu Wenhuan.Word semanticsimilarity calculation based on domain knowledge andHowNet[J].Telkomnika Indonesian Journal of ElectricalEngineering,2014,12(2):1143-1148.
  • 7王秀慧,王丽珍,麻淑芳.结合语义的改进FTC文本聚类算法[J].计算机工程与设计,2014,35(2):515-519. 被引量:5
  • 8王小林,王东,杨思春,邰伟鹏,郑啸.基于《知网》的词语语义相似度算法[J].计算机工程,2014,40(12):177-181. 被引量:16
  • 9Wang Huiying,Liu Xiangwei.Study on frequent termset-based clustering algorithm[C].Proceedings of the 8thInternational Conference on Fuzzy Systems and KnowledgeDiscovery,2011:1182-1186.
  • 10Zhang Wen,Yoshida T,Tang Xijin.Text clustering usingfrequent itemsets[J].Knowledge-Based Systems,2010,256(67):379-388.

二级参考文献86

共引文献101

同被引文献138

引证文献14

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部