

Dirichlet Process Based Hierarchy Topic Model
摘要 在微博中,用户规模不断增大,用户发言的信息也在急剧增长,这给海量信息中挖掘用户关注的热点话题提出了严峻的挑战。用户发言的内容会随着好友的转发在网络中形成一个树形结构,该树形结构包含的内容就是一个话题。然而在树的构建中,微博信息的迭代次数取决于树的高度,这在海量的微博信息中是不可行的。提出了一种群树的话题网络构建模型,只需要很少的迭代次数就可以构建话题的子树;然后通过LDA模型对子树之间潜在的话题进行分析。实验表明,提出的基于Dirichlet过程的层次话题模型,无论在准确性和计算效率上,都优于现有的相关研究。 As the growth of the size of users, data grows rapidly in Microblog, and this poses a huge challenge for mining hot topics from Microblog data. A post from a user can be forwarded by his friends, the friends of friends can also forward this post, and then a topic tree is constructed. However, in the construction of the tree, the number of iterations of the Microblog data depends on the height of the tree, and this is impossible in Microblog. A trees-based topic network construction model is proposed, which needs only a few iterations of Microblog data, and analyzes the latent topics among the trees by the Dirichlet process. Experiments show that our Dirichlet process based hierarchy topic model is better than related work in both accuracy and efficiency.
出处 《科学技术与工程》 北大核心 2013年第27期8192-8196,共5页 Science Technology and Engineering
关键词 DIRICHLET 话题 层次 模型 算法 Dirichlet topic hierarchy model algorithm
  • 相关文献


  • 1路荣,项亮,刘明荣,等.基于隐主题分析和文本聚类的微博客新闻话题发现研究[C] // 第六届全国信息检索学术会议论文集. 北京:中国中文信息学会,2010.
  • 2韩忠明,陈妮,乐嘉锦,段大高,孙践知.面向热点话题时间序列的有效聚类算法研究[J].计算机学报,2012,35(11):2337-2347. 被引量:31
  • 3He Q, Chang K, Lim E P, et al. Keep it simple with time: a reex- amination of probabilistic topic detection models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2010; 32 (10) : 1795-1808.
  • 4Alvanaki F, Sebastian M, Ramamritham K, et al. EnBlogue : emer- gent topic detection in web 2. 0 streams. Proceedings of the 2011 in- ternational conference on Management of data. ACM, 2011: 1271 -1274.
  • 5Cataldi M, Di Caro L, Schifanella C. Emerging topic detection onTwitter based on temporal and socialterms evaluation, in MDMKDD " 10, New York, NY, USA: ACM, 1--4.
  • 6Cheng J, Zhou J, Qiu S. Fine-grained topic detection in news search results. Proceedings of the 27th Annual ACM Symposium on Applied Computing. ACM, 2012:912--917.
  • 7Chen Y, Li Z, Nie L, et al. A Semi-Supervised Bayesian Network Model for Microblog Topic Classification. Proceedings of COLING. 2012.
  • 8Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. The Jour- nal of Machine Learning Research, 2003 ; 3 : 993-1022.
  • 9Huang B, Yang Y, Mahmood A, et al. Microblog topic detection based on LDA model and single-pass Clustering. Rough Sets and Cur- rent Trends in Computing. SpringerBerlin Heidelberg, 2012 : 166-171.
  • 10Lin C, He Y, Everson R, et al. Weakly supervised joint sentiment- topic detection from text. IEEE Transactions on Knowledge and Data Engineering, 2012; 24(6): 1134-1145.


  • 1李爱国,覃征.在线分割时间序列数据[J].软件学报,2004,15(11):1671-1679. 被引量:27
  • 2詹艳艳,徐荣聪,陈晓云.基于斜率提取边缘点的时间序列分段线性表示方法[J].计算机科学,2006,33(11):139-142. 被引量:46
  • 3杨一鸣,潘嵘,潘嘉林,杨强,李磊.时间序列分类问题的算法比较[J].计算机学报,2007,30(8):1259-1266. 被引量:40
  • 4Szabo G, Huberman B A. Predicting the popularity of online content. Communications of the ACM, 2010, 53(8): 80-88.
  • 5Kumar R, Novak J, Raghavan Pet al. On the bursty evolu- tion of blogspace//Proceedings of the World Wide Web Conference on Internet and Web Information Systems. Washington, USA, 2005:159-178.
  • 6Mei Q, Liu C, Su H et al. A probabilistic approach to spatiotemporal theme pattern mining on weblogs//Proceedings of the 15th International Conference on World Wide Web. New York, USA, 2006: 533-542.
  • 7Crane R, Sornette D. Robust dynamic classes revealed by measuring the response function of a social system//Proceed ings of the National Academy of Sciences of the United States of America. Washington, USA, 2008:15649-15653.
  • 8Malmgren R D, Stouffer D B, Motter A E et al. A Poisso- nian explanation for heavy tails in email communication// Proceedings of the National Academy of Sciences of the Unit- ed States of America. New York, USA, 2008:18153-18158.
  • 9Barabasi A L. The origin of bursts and heavy tails in human dynamics. Nature, 2005, 435(7039): 207-211.
  • 10Yang J, Leskovec J. Patterns of temporal variation in online media//Proeeedings of the 4th ACM International Conference on Web Search and Data Mining. New York, USA, 2011: 177-186.









使用帮助 返回顶部