期刊文献+

社会网络中基于U_BTM模型的主题挖掘 被引量:5

Topic mining based on U_BTM model in social networks
下载PDF
导出
摘要 现有的主题模型在挖掘社会网络中的主题时往往会受到社会网络中短文本的特征稀疏问题的影响,从而导致所挖掘的主题质量低、主题差异小。为此,基于词对主题模型BTM提出一种用户—词对主题模型U_BTM,采用K-means聚类算法将主题相近的短文本聚类成一个文档,根据文档中用户的主题对词对的产生模式进行建模,采用Gibbs Sampling方法对模型的参数进行推导,最终得到社会网络中潜在的主题和用户的主题分布。实验结果表明,U_BTM模型能得到潜在的主题和每个用户的主题分布,且相比其他模型所挖掘的主题差异大,具有更高的质量和更低的困惑度。 The existed topic models are influenced by the characteristics of the short text in the social network when mine topics, which leads to the low quality of the topics and the small different between topics. To solve the problem, this paper proposed a new topic model U_BTM (user -bitem topic model) based on the topic model BTM. Firstly, it used K-means clustering algorithm to cluster the short text into a document, then modeled the generation schema of bitem according to the users' topic, and adopted the Gibbs sampling method to derive the model, finally obtained the potential topics and the topic distribution of users. Experimental results show that the U_BTM model can not only get the potential topics and the topic distribution of each user, but also has a higher quality and difference and lower perplexity than other models.
出处 《计算机应用研究》 CSCD 北大核心 2017年第1期132-135,146,共5页 Application Research of Computers
基金 国家自然科学基金项目(71271117) 江苏省科技支撑计划项目(BE2011156)
关键词 社会网络 主题模型 短文本 主题挖掘 social network topic model short texts topic mining
  • 相关文献

参考文献5

二级参考文献127

  • 1罗永,成礼智,陈波,吴翊.数字高程模型数据整数小波水印算法[J].软件学报,2005,16(6):1096-1103. 被引量:16
  • 2周晓云,孙志挥,张柏礼,杨宜东.高维类别属性数据流离群点快速检测算法[J].软件学报,2007,18(4):933-942. 被引量:21
  • 3陈友,程学旗,李洋,戴磊.基于特征选择的轻量级入侵检测系统[J].软件学报,2007,18(7):1639-1651. 被引量:78
  • 4Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990.
  • 5Hofmann T. Probabilistic latent semantic indexing//Proceedings of the 22nd Annual International SIGIR Conference. New York: ACM Press, 1999:50-57.
  • 6Blei D, Ng A, Jordan M. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 7Griffiths T L, Steyvers M. Finding scientific topics//Proceedings of the National Academy of Sciences, 2004, 101: 5228 5235.
  • 8Steyvers M, Gritfiths T. Probabilistic topic models. Latent Semantic Analysis= A Road to Meaning. Laurence Erlbaum, 2006.
  • 9Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Technical Report 653. UC Berkeley Statistics, 2004.
  • 10Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, B39(1): 1-38.

共引文献421

同被引文献19

引证文献5

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部