期刊文献+

基于特征融合的K-means微博话题发现模型 被引量:6

K-means Weibo topic discovery model based on feature fusion
下载PDF
导出
摘要 针对传统话题检测方法在微博短文本上存在高维稀疏的缺陷,提出了一种基于特征融合的K-means微博话题发现模型。为了更好地表达微博话题的语义信息,使用在句子中共现的词对向量模型(Biterm_VSM)代替传统的向量空间模型(Vector Space Model,VSM),并结合主题模型(Latent Dirichlet Allocation,LDA)挖掘出微博短文本中的潜在语义,把两个模型得到的特征进行特征融合,并应用K-means聚类算法进行话题的发现。实验结果表明,与传统的话题检测方法相比,该模型的调整兰德系数(Adjusted Rand index,ARI)为0.80,比传统的话题检测方法提高了3%~6%。 Aiming at the shortcomings of high-dimensional sparseness in the short text of Weibo on traditional topic detection methods,a K-means Weibo topic discovery model based on feature fusion was proposed.In order to better express the semantic information of Weibo topics in this paper,the word-pair vector model(Biterm_VSM)co-occurring in sentences is used instead of the traditional vector space model(VSM),and combined with the topic model(Latent Dirichlet Allocation,LDA)to mine the potential semantics of Weibo short text,merging features obtained from the two models,and applying K-means clustering algorithm to discover topics.The Experimental results show that compared with the traditional topic detection method,the model's adjusted Rand index(ARI)is 0.80,which is 3%~6% higher than the traditional topic detection method.
作者 李海磊 杨文忠 李东昊 温杰彬 钱芸芸 Li Hailei;Yang Wenzhong;Li Donghao;Wen Jiebin;Qian Yunyun(College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China;National Engineering Laboratory for Public Safety Risk Perception and Control by Big Data(PSRPC),China Academy of Electronics and Information Technology,Beijing 100041,China)
出处 《电子技术应用》 2020年第4期24-28,33,共6页 Application of Electronic Technique
基金 国家自然科学基金项目(U1603115) 自治区自然科学基金项目(2017D01C042)。
关键词 话题检测 词对向量模型 LDA 特征融合 K-MEANS topic detection Biterm_VSM LDA feature fusion K-means
  • 相关文献

参考文献10

二级参考文献46

共引文献78

同被引文献85

引证文献6

二级引证文献83

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部