摘要
以微博为代表的社交网络已成为社会舆情的战略要地。对于社交网络中隐含主题社区的发掘,具有较高的商业推广和舆情监控价值。近年来,概率生成主题模型LDA(Latent Dirichlet Allocation)在数据挖掘领域得到了广泛应用。但是,一般而言,LDA适用于处理文本、数字信号数据,并不能合理地用来处理社交网络用户的关系数据。对LDA进行修改,提出适用于处理用户关系数据的Tri-LDA模型,挖掘社交网络中的主题社区。实验结果表明,基于Tri-LDA模型,进行机器学习所得到的结果基本能够反映社交网络上真实的主题社区分布情况。
Social networks has gained huge popularity in particular microblogs in recent years. The discovery of latent topic communities in social networks carries high value in commercial promotion,public opinion monitoring,etc. In recent years,probabilistic generative topic model( Latent Dirichlet Allocation,LDA) has been widely applied in the field of data mining. Generally,LDA can process text or digital signal data,however,without any modification,it lacks the capability to properly process the relation data between users in a social network. By modifying the original LDA model,this essay proposes a new model,Tri-LDA and applies it to dig the hidden topic communities in a social network. The experiment result shows that the topic communities found by Tri-LDA is basically consistent with the realistic topic communities that hand-labeled by the authors.
出处
《计算机与现代化》
2014年第8期21-25,29,共6页
Computer and Modernization
基金
广东省高等学校科技创新项目(2013KJCX0117)