摘要
在微博中,用户规模不断增大,用户发言的信息也在急剧增长,这给海量信息中挖掘用户关注的热点话题提出了严峻的挑战。用户发言的内容会随着好友的转发在网络中形成一个树形结构,该树形结构包含的内容就是一个话题。然而在树的构建中,微博信息的迭代次数取决于树的高度,这在海量的微博信息中是不可行的。提出了一种群树的话题网络构建模型,只需要很少的迭代次数就可以构建话题的子树;然后通过LDA模型对子树之间潜在的话题进行分析。实验表明,提出的基于Dirichlet过程的层次话题模型,无论在准确性和计算效率上,都优于现有的相关研究。
As the growth of the size of users, data grows rapidly in Microblog, and this poses a huge challenge for mining hot topics from Microblog data. A post from a user can be forwarded by his friends, the friends of friends can also forward this post, and then a topic tree is constructed. However, in the construction of the tree, the number of iterations of the Microblog data depends on the height of the tree, and this is impossible in Microblog. A trees-based topic network construction model is proposed, which needs only a few iterations of Microblog data, and analyzes the latent topics among the trees by the Dirichlet process. Experiments show that our Dirichlet process based hierarchy topic model is better than related work in both accuracy and efficiency.
出处
《科学技术与工程》
北大核心
2013年第27期8192-8196,共5页
Science Technology and Engineering