摘要
现有的主题模型在挖掘社会网络中的主题时往往会受到社会网络中短文本的特征稀疏问题的影响,从而导致所挖掘的主题质量低、主题差异小。为此,基于词对主题模型BTM提出一种用户—词对主题模型U_BTM,采用K-means聚类算法将主题相近的短文本聚类成一个文档,根据文档中用户的主题对词对的产生模式进行建模,采用Gibbs Sampling方法对模型的参数进行推导,最终得到社会网络中潜在的主题和用户的主题分布。实验结果表明,U_BTM模型能得到潜在的主题和每个用户的主题分布,且相比其他模型所挖掘的主题差异大,具有更高的质量和更低的困惑度。
The existed topic models are influenced by the characteristics of the short text in the social network when mine topics, which leads to the low quality of the topics and the small different between topics. To solve the problem, this paper proposed a new topic model U_BTM (user -bitem topic model) based on the topic model BTM. Firstly, it used K-means clustering algorithm to cluster the short text into a document, then modeled the generation schema of bitem according to the users' topic, and adopted the Gibbs sampling method to derive the model, finally obtained the potential topics and the topic distribution of users. Experimental results show that the U_BTM model can not only get the potential topics and the topic distribution of each user, but also has a higher quality and difference and lower perplexity than other models.
出处
《计算机应用研究》
CSCD
北大核心
2017年第1期132-135,146,共5页
Application Research of Computers
基金
国家自然科学基金项目(71271117)
江苏省科技支撑计划项目(BE2011156)