期刊文献+

一种基于LDA主题模型的话题发现方法 被引量:21

A LDA Model Based Topic Detection Method
下载PDF
导出
摘要 话题发现是提取热点话题并掌握其演化规律的关键技术之一。针对社交网络中海量短文本信息具有高维性导致主题模型难以处理以及主题分布不均导致主题不明确的问题,提出一种基于LDA(latent dirichlet allocation)主题模型的CBOW-LDA主题建模方法,通过引入基于CBOW(continuous bag-of-word)模型的词向量化方法对目标语料进行相似词的聚类,能够有效降低LDA模型输入文本的维度,并且使主题更明确。通过在真实数据集上计算分析,与现有基于词频权重的词向量化LDA方法相比,在相同主题词数情况下困惑度可降低约3%。 Topic Detection is one of the most important techniques in hot topic extraction and evolution tracking. Due to the high dimensionality problem which hinders processing efficiency and topics mal-distribution problem which makes topics unclear, it is difficult to detect topics from a large number of short texts in social network. To address these challenges, we proposed a new LDA ( Latent Dirichlet Allocation) model based topic detection meth-od called CBOW-LDA topic modeling method. It utilizes a CBOW( Continuous Bag-of-Word) method to cluster the words, which generate word vectors and clustering by vectors similarity. This method decreases the dimensions of LDA output, and makes topic more clearly. Through the analysis of topic perplexity in the real-world dataset, it is obvious that topics detected by our method has a lower perplexity, comparing with word frequency weighing based vectors. In a condition of same number of topic words, perplexity is reduced by about 3%.
出处 《西北工业大学学报》 EI CAS CSCD 北大核心 2016年第4期698-702,共5页 Journal of Northwestern Polytechnical University
基金 国家自然科学基金(61402373 61303224 61403311) 航空科学基金(20155553036 2013ZC53034)资助
关键词 词向量 LDA模型 话题发现 困惑度 word vectors LDA model topic detection perplexity
  • 相关文献

参考文献9

  • 1Cheng Xueqi, Yan Xiaohui, Lan Yanyan, et al. BTM: Topic Modeling Over Short Texts[J]. IEEE Trans on Knowledge and Data Engineering, 2014, 26(12): 2928-2941.
  • 2Mikolow Tomas, Yih Wentau Scott, Zweiq Geoffery. Linguistic Reqularities in Contrmcous Space Word Representations[C]//Proceedings of the 12nd Conference of the North Anerican Chapter of the Association for Computational Linguistics, Atlanta, USA: NAACL, 2013.
  • 3Dermouche M, Velcin J, Khouas L, et al. A Joint Model for Topic-Sentiment Evolution Over Time[C]//Proceedings of 14th IEEE International Conference on Data Mining. Shenzhen, China, 2014.
  • 4Huang Bo, Yang Yan, Mahmood Amjad, et al. Microblog Topic Detection Based on LDA Model and Single-Pass Clustering[C]//Proceedings of 7th International Conference on Rough Sets and Current Trends in Computing. Chengdu, China, 2012.
  • 5Darling M William, Song Fei. Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA[J]. ArXiv:1303.2826, 2013.
  • 6Bai Xue, Chen Fu, Zhan Shaobin. A New Clustering Model Based on Word2vec Mining on Sina Weibo Users' Tags[J]. International Journal of Grid Distribution Computing, 2014, 7(3): 41-48.
  • 7Zhou Xinjie, Wan Xiaojun, Xiao Jianguo. Repre-Sntation Learning for Aspect Category Detection in Online Reviews[C]. Proceedings of the 29th AAAI Conference on Artificial Intelligence. Austin, Texas, USA, 2015.
  • 8Mikolov Tomas, Sutskever Hya. Distributed Representutions of Words and Phrases and Their Compositionality[C]//Proceedings of the Ilth Newral Information Processing Systems Conference Lake Tahoe, USA: NIPS, 2013.
  • 9Cao Ziqiang, Li Sujian, Liu Yang, et al. A Novel Neural Topic Model and Its Supervised Extension[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Austin, Texas, USA, 2015.

同被引文献176

引证文献21

二级引证文献68

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部