期刊文献+

一种面向多文本集的部分比较性混合模型 被引量:2

A Partial Comparative Mixture Model for Multi-collections Documents
下载PDF
导出
摘要 针对当前主要的CTM模型只能分析公共话题的缺陷,提出一种PCCMix混合模型来实现跨文本集的话题分析.该模型把多个文本集中的话题划分为公共话题和文本集特有话题,首先根据文本数据建立这两类话题在所有词上的概率分布,再使用期望最大化算法进行模型的参数估计.实验结果表明,该模型不仅能够发现公共话题在不同文本集中的差异,而且能分析各文本集特有的话题.模型能更精确地对文本建模,具有良好的性能. State-of-the-art cross collections topic models suffer from major flaw that they can only analyze the common topics among document collections.We introduced a mixture model PCCMix (Partial comparative Cross Collections Mixture) for multi-collections CTM to detect both common topics and collection-special topics.PCCMix divides the two types of topics in document collections by estimating a probability distribution from the whole dataset in advance,and then trains the model by the Expectationmaximuzation algorithm (EM).Experiment results show that PCCMix can analyze both common topics among collections and collection special topics.The PCCMix model is very effective and can model the document collections more precisely than the two main CTM models.
出处 《湖南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2013年第11期101-107,共7页 Journal of Hunan University:Natural Sciences
基金 国家自然科学基金资助项目(60903225) 湖南省自然科学基金资助项目(11JJ5044) 国防科学技术大学优秀研究生创新基金资助项目(S100502)
关键词 概率分布 比较性文本挖掘 部分可比性 PCCMix模型 混合模型 probability distributions comparative text mining partial comparative PCCMix(Partial comparative Cross Collections Mixture)model mixture model
  • 相关文献

参考文献18

  • 1ZHAI C, VELIVELLI A, YU B. A cross-collection mixture mode for momparative text mining[C]//Proceedings of the KDD. Seattle: ACM, 2004z 743-748.
  • 2PAUL M, GIRJU R. Cross-cultural analysis of Blogs and forums with mixed-collection topic models[C]// Proceedings of the Confer- enee on EMNLP. Singapore: ACL, 2009:1408-1417.
  • 3PAUL M G. Comparative scientific research analysis with a language- independent cross-collection model[C]//Proceedings of SEPLN. Va- lencia, Spain, 2010:153-160.
  • 4MEI Q, LIU C, SU H, etal. A probabilistie approach to spatiotem- poral theme pattern mining on weblogs[C]/ Proceedings of the WWW. Edinburgh: ACM, 2006: 533-542.
  • 5MEI Q, ZHAI C. Discovering evolutionary theme patterns from text- An exploration of temporal text mining[C]// Proceedings of the KDD. Chicago: ACM, 2005.. 198-207.
  • 6YIN Z, CAO L, HAN J, et al. Geographical topic discovery andcomparison[C]//Proceedings of the WWW. Hyderabad: ACM, 2011 : 247-256.
  • 7DEERWESTER S, DUMAIS S, FURNAS G,et al. Indexing by la- tent semantic analysis[J]. Journal of the American Society for Irdor- l-nation Science,1990,41 : 391-407.
  • 8HOFMANN T. Probabilistic latent semantic indexing[C]//Proceed- in.gs of SIGIR. New York: ACM, 1999:50-57.
  • 9BLEI D M,NG A Y,JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003,3: 993-1022.
  • 10M W, MCCALLUM A. Pachinko allocation: DAG-struetured mix- ture models of topic correlations[C]// Proceedings of the ICML. New York: ACM, 2006: 577-584.

同被引文献30

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部