期刊文献+

融合链接文本的增量联合主题模型

Joint incremental topic modeling by fusing text and link
下载PDF
导出
摘要 在基于链接的概率隐含语义分析的基础上提出一种融合文本链接的增量方法进行主题建模。首先在原有网页集上进行主题建模;然后随着网页的结构和内容动态变化,利用一种合理的更新机制更新模型参数,从而高效快速地处理在线网页流的动态变化。此外,提出一个自适应非对称学习方法融合文本与链接模态的隐含主题。对于每个网页,它在两种模态上的主题分布通过加权进行融合,而权值由该网页的特征词分布的熵值确定。由于融合之后的概率结构合理地关联了链接模态和文本模态的信息,故能得到很好的建模效果。两种类型的数据集上的实验结果显示该算法可以有效地节省时间,并对网页分类有较大性能的提高,此外还提供了由本文模型生成的主题显示结果。 This paper proposed an incremental algorithm integrating both content and link for topic modeling based on link-PLSA.Firstly,it performed topic modeling on the initial dataset.And then presented a reasonable technique of updating parameter of model to effectively integrate the newly arriving documents and linked into the original model.Furthermore,it proposed an adaptive asymmetric learning approach to fuse the latent topics of both content and link modality.For each webpage,it fused the distribution over topics of each model by multiplying different weights,which determined by the entropy of the distribution of words.A better topic modeling could be achieved as the probabilistic structure associates content and link modalities properly.Empirical experiments on two data sets with different link structure show that the approach is time saving and indicate that the model leads to systematic improvements in the quality of classification.Besides,this paper presented some interesting visualizations generated by the model.
作者 马慧芳 王博
出处 《计算机应用研究》 CSCD 北大核心 2012年第4期1289-1293,共5页 Application Research of Computers
基金 西北师范大学青年教师科研能力提升计划资助项目(NWNU-LKQN-10-1 SKQNGG10018)
关键词 主题模型 增量学习 链接—概率隐含语义分析 自适应非对称学习 自适应增量链接—概率隐含语义分析 topic models incremental learning link-PLSA adaptive asymmetric learning adaptive link-IPLSA
  • 相关文献

参考文献14

  • 1HOFMANN T.Unsupervised learning by probabilistic latent semanticanalysis[J].Machine Learning,2001,42(1-2):177-196.
  • 2BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3(4-5):993-1022.
  • 3COHN D,HOFMANN T.The missing link:a probabilistic model ofdocument content and hypertext connectivity[C]//Advances in Neu-ral Information Processing Systems.2001.
  • 4STEYVERS M,GRIFFITHS T.Probabilistic topic models,handbook oflatent semantic analysis[M].Mahwah:Laurence Erlbaum AssociatesInc,2006.
  • 5BLEI D,LAFFERTY J.Correlated topic models[C]//Advances inNeural Information Processing Systems.2006:147-154.
  • 6ZHU Sheng-huo,YU Kai,CHI Yun,et al.Combining content and linkfor classification using matrix factorization[C]//Proc of the 30th An-nual International ACM SIGIR Conference.New York:ACM,2007:348-361.
  • 7MEI Qiao-zhu,CAI Deng,ZHANG Duo,et al.Topic modeling withnetwork regularization[C]//Proc of the 17th International Conferenceon World Wide Web.New York:ACM,2008:101-110.
  • 8MENCZER F.Mapping the semantics of Web text and links[J].IEEEInternet Computing,2005,9(3):27-36.
  • 9EROSHEVA E,FIENBERG S,LAFFERTY J.Mixed-membershipmodels of scientific publications[J].Proc of National Academy ofSciences,2004,101(suppl 1):5220-5227.
  • 10GRUBER A,ROSEN Z M,WEISS Y.Latent topic models for hyper-text[C]//Proc of the 24th Conference on Uncertainty in Artificial In-telligence.2008.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部