期刊文献+

基于动态权重的LDA算法 被引量:6

LDA Algorithm Based on Dynamic Weight
下载PDF
导出
摘要 潜在狄利克雷分布(Latent Dirichlet Allocation,LDA)是一种流行的三层概率主题模型,其实现了文本与文本中的单词在主题层次上的聚类。该模型以词袋(Bag of Words,BOW)模型为假设,所有单词的重要性相同,简化了建模的复杂度,但使得主题分布倾向于高频词,影响了主题模型的语义连贯性。针对此问题,提出了一种基于动态权重的LDA算法,该算法的基本思想是每个单词在建模中具有不同的重要性,在迭代过程中根据单词的主题分布动态生成相应的权重并反作用于主题建模,降低了高频词对建模的影响,提高了关键词:的重要性。在4个公开数据集上的实验表明,基于动态权重的LDA算法在主题语义连贯性、文本分类准确率、泛化性能和精度方面比目前流行的LDA推理算法表现得更加优越。 The latent Dirichlet allocation(LDA)is a popular three-layer probability topic model,which implements the clustering of words in document and document at the topic level.This model is based on the Bag of Words(BOW)mo-del,and each word has the same importance.It simplifies the complexity of modeling,but makes the topic distributions tend to high-frequency words,which affects the semantic coherence of the topic model.To achieve this goal,an LDA algorithm based on dynamic weight was proposed.The fundamental idea of the algorithm is that each word has different importance.In the iterative process of modeling,word weights are generated dynamically according to the topic distribution of words and feedback to topic modeling,reducing the influence of high frequency words and improving the role of keywords.Experiments on four public datasets show that the LDA algorithm based on dynamic weight can be superior to the current popular LDA inference algorithms in terms of topic semantic coherence,text classification accuracy,gene-ralization performance and precision.
作者 居亚亚 杨璐 严建峰 JU Ya-ya;YANG Lu;YAN Jian-feng(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
出处 《计算机科学》 CSCD 北大核心 2019年第8期260-265,共6页 Computer Science
基金 国家自然科学基金(61572339,61272449) 江苏省科技支撑计划重点项目(BE2014005)资助
关键词 潜在狄利克雷分布 主题模型 动态权重 Latent dirichlet allocation Topic model Dynamic weight
  • 相关文献

参考文献2

二级参考文献13

  • 1李峰,李芳.中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007,21(3):99-105. 被引量:105
  • 2Liu X Y, Croft W B. Cluster-based retrieval using language mo- dels[C]//Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrievel. 2004 : 186-193.
  • 3Wei X, Croft W J3. Lda-based document models for ad-hoc re- trieval[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrievel. 2006 .. 178-185.
  • 4Blei D M,Ng A,Jordan M. Latent Dirichlet allocation[J]. Jour- nal of Machine Learning Research, 2003,3 (1) .. 993-1022.
  • 5Griffiths T L, Steyvers M. Finding scientific topics [J]. Procee- dings of the National Academy of Sciences of USA, 2004, 101 (1) : 5228-5235.
  • 6Zeng Jia, Cheung W K, Liu Ji-ming. Learning topic models by belief Propagation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,33(5) .. 1121-1134.
  • 7Asuncion A U,Welling M, Smyth P, et al. On smoothing and in- ference for topic models[C]//Proceedings of the 25th Confer- ence on Uncertainty in Artificial Intelligence. 2009:27-34.
  • 8Yao L, Mimno D M, McCallum A. Efficient methods for topic model inference on streaming document collections[C]//Pro- ceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Miningl 2009 : 937-946.
  • 9Porteous I, Newman D, Ihler A T, et al. Fast collapsed gibbs sampling for latent dirichlet allocation[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Knowledge Discovery and Data Mi- ning. 2008 : 569-577.
  • 10Manning C D, Raghavan P, Schiitze H. Introduction to informa- tion retrieval[M]. England: Cambridge University Press, 2008.

共引文献4

同被引文献41

引证文献6

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部