期刊文献+

结合词向量和聚类算法的新闻评论话题演进分析 被引量:14

Analysis on topic evolution of news comments by combining word vector and clustering algorithm
下载PDF
导出
摘要 话题演进分析主要是挖掘话题内容随着时间流的演进情况。话题的内容可用关键词来表示。利用word2vec对75万篇新闻和微博文本进行训练,得到词向量模型。将文本流处理后输入模型,获得时间序列下所有词汇的词向量,利用K-means对词向量进行聚类,从而实现话题关键词的抽取。实验对比了基于PLSA和LDA主题模型下的话题抽取效果,发现本文的话题分析效果优于主题模型的方法。同时,采集足够大量、内容足够丰富的语料,可训练得到泛化能力比较强的模型,有利于实时话题演进分析研究工作。 The analysis of topic evolution is regarded as the mining of topic content evolving with the time. This article, based on the hypothesis that topic content may be embodied by key words, adopt word2vec for the training of 750 thousand pieces of news and micro-blog texts to establish the model of word vector. The text information flow is applied to the model and all word vectors by time series are acquired. K-means is used to cluster the word vectors before the key words are drawn and the analysis of topic evolution is visualized. By comparing the effect of the word vector model with those of PLSA or LDA topic models on drawing topic, the results show that the former is more effective than the latter two models. In addition, the collection of abundant and varied data can facilitate the training of the word vector model with better generalization ability and the investigation on real-time analysis of topic evolution.
出处 《计算机工程与科学》 CSCD 北大核心 2016年第11期2368-2374,共7页 Computer Engineering & Science
基金 国家社科基金项目(12BYY045) 广东省哲学社会科学"十二五"规划项目(GD15YTS01)
关键词 话题演进 word2vec PLSA LDA topic evolution word2vec PLSA LDA
  • 相关文献

参考文献9

二级参考文献139

共引文献188

同被引文献133

引证文献14

二级引证文献108

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部