期刊文献+

基于词向量和变分自动编码器的短文本主题模型 被引量:2

Short text topic model based on word vector and variational autoencoder
下载PDF
导出
摘要 为了解决短文本稀疏性问题,提高主题模型的性能,提出了一种词向量嵌入的主题模型。首先,假设一篇文档只包含一个主题;其次,利用词向量对每一轮迭代的主题进行扩充与调整,即对每一个主题,利用一种非参数化的概率采样方法得到一些词,再用词向量找出相似词,提升该主题下相似词的权重;最后,用拉普拉斯近似主题分布,使其更好地运用在变分自动编码器训练中,从而加快训练速度。实验结果表明,本文模型训练出的主题具有较好的解释性,并优于其他主流的模型,可为短文本的主题提取提供更多的可能。在主题模型训练的过程中,利用词向量干预主题词分布可以得到较好的主题质量,并可以通过变分自动编码器加快训练速度,对自然语言处理问题的研究具有一定的创新性和参考价值。 In order to solve the problem of short text sparsity and improve the performance of the model, a topic model embedded by word vector is proposed. Firstly, that a document contains only one topic is supposed. Secondly, we use word vector to expand and adjust the theme of each iteration. That is to say, for each topic, we use a non-parametric probability sampling method to get some words, and then use word vector to find similar words, so as to enhance the weight of similar words under the topic. Finally, a Laplace approximation to the topic distribution is constructed, so that it is better trained by the variational autoencoder, thus speeding up the training speed. The experimental results show that the model has much more interpretable topics and outperforms other mainstream training models, thus providing more possibilities for the topic extraction of short text. In the process of thematic model training, the use of word vectors to interfere with the distribution of thematic words can achieve better quality of themes, and can speed up the training speed through the variational autoencoder, which has a certain innovation and reference value for the research of natural language processing.
作者 张青 韩立新 勾智楠 ZHANG Qing;HAN Lixin;GOU Zhinan(College of Computer and Information,Hohai University,Nanjing,Jiangsu 211100,China)
出处 《河北工业科技》 CAS 2018年第6期441-447,共7页 Hebei Journal of Industrial Science and Technology
基金 江苏省研究生科研与实践创新计划项目(KYCX17_0486) 中央高校基本科研业务费专项资金(2017B708X14) 河北省人力资源社会保障课题(JRSHZ-2018-08018)
关键词 计算机神经网络 主题模型 词向量 变分自动编码器 短文本 computer neural network topic model word vector variational autoencoder short text
  • 相关文献

参考文献1

二级参考文献4

共引文献2

同被引文献38

引证文献2

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部