期刊文献+

一种面向词汇突发的连续时间主题模型 被引量:6

A Continuous-time Topic Model for Word Burstiness
下载PDF
导出
摘要 针对传统基于多项式分布的主题模型不能较好地刻画文档中词汇突发的现象,综合考虑文本集固有的时间信息,提出一种面向词汇突发的Dirichlet组合多项式(DCM)连续时间主题模型。采用DCM分布对文本集中的词汇突发现象进行建模,利用Beta分布刻画文本集中的时间特征,通过Gibbs采样和不动点迭代法实现模型参数的估计。实验结果表明,在预设主题数目较少的情况下,与To T和DCMLDA模型相比,该模型具有明显的泛化性能优势,并且可以有效揭示出文本集中潜在的主题演化趋势。 To solve the problem that traditional topic models based on multinomial distribution cannot properly capture the condition of word burstiness,a continuous-time topic model with Dirichlet Compound Multinomial(DCM)for word burstiness is proposed,which integrates inherent temporal information in the corpus.In this model,the phenomenon of word burstiness is modeled by DCM distribution,while temporal features are characterized by Beta distribution.Gibbs sampling and fixed-point iteration method are employed to estimate the parameters in the model.Experimental results demonstrate that the model has obvious advantages over ToT and DCMLDA in terms of generalization performance when the given number of topics is small,and it can also effectively reveal the latent evolutions of topics in the corpus.
出处 《计算机工程》 CAS CSCD 北大核心 2016年第11期195-201,共7页 Computer Engineering
基金 国家自然科学基金(61462022)
关键词 主题模型 潜在Dirichlet分配 词汇突发 Dirichlet组合多项式 GIBBS采样 不动点迭代法 topic model Latent Dirichlet Allocation (LDA) word burstiness Dirichlet Compound Multinomial (DCM) Gibbs sampling fixed-point iteration method
  • 相关文献

参考文献3

二级参考文献82

  • 1Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990.
  • 2Hofmann T. Probabilistic latent semantic indexing//Proceedings of the 22nd Annual International SIGIR Conference. New York: ACM Press, 1999:50-57.
  • 3Blei D, Ng A, Jordan M. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 4Griffiths T L, Steyvers M. Finding scientific topics//Proceedings of the National Academy of Sciences, 2004, 101: 5228 5235.
  • 5Steyvers M, Gritfiths T. Probabilistic topic models. Latent Semantic Analysis= A Road to Meaning. Laurence Erlbaum, 2006.
  • 6Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Technical Report 653. UC Berkeley Statistics, 2004.
  • 7Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, B39(1): 1-38.
  • 8Bishop C M. Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.
  • 9Roweis S. EM algorithms for PCA and SPCA//Advances in Neural Information Processing Systems. Cambridge, MA, USA: The MIT Press, 1998, 10.
  • 10Hofmann T. Probabilistic latent semantic analysis//Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Stockholm, Sweden, 1999:289- 296.

共引文献295

同被引文献71

引证文献6

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部