期刊文献+

面向微博热点话题发现的改进BBTM模型研究 被引量:3

Research on Improved BBTM Model for Microblog Hot Topic Discovery
下载PDF
导出
摘要 针对目前基于主题模型的微博短文本热点话题发现存在特征稀疏、高维度以及需要人工指定主题数目等问题,提出一种基于改进突发词对主题模型(bursty biterm topic model,BBTM)的热点话题发现方法(hot topic-hot biterm topic model,H-HBTM)。首先,利用词的突发概率进行特征选择,过滤非突发词。其次,结合微博文本的突发特性和传播特性计算微博词对的热值突发概率,将热值突发概率作为BBTM的先验概率。最后,利用基于密度的方法自适应选择BBTM的最优话题数目,确定最优BBTM,实现热点话题发现。在真实微博数据集上的实验表明,H-HBTM可以在不需要预先设定主题数目的情况下,自动发现最优话题模型,并且H-HBTM发现的热点话题的质量高于基于BBTM、词对主题模型以及潜在狄立克雷分配的方法。 In order to overcome the problems of current hot topic discovery methods based on topic model, such as the sparsity of features, the high dimension, and the requirement for pre-specifying the number of topics, a hot topic discovery method based on an improved bursty biterm topic model (BBTM) which is called hot topic-hot biterm topic model (H-HBTM) is proposed. First, the word burst probability is used to select features and to filter the nonburst words. Second, the hot burst probability of micro-blog word pairs can be expressed by integrating the burst characteristic and the propagation characteristic of micro-blog texts. The hot burst probability is used as the prior probability of the BBTM model. Finally, a density based method is used to select the optimal number of topics for the BBTM model so that the optimal BBTM model is determined to detect hot topics. The experiments conducted on the real micro-blog datasets demonstrate that the H-HBTM can automatically find the optimal model without prespecifying the number of topics, and the quality of the hot topics found is superior to the other methods, such as the BBTM, the biterm topic model and the latent Dirichlet allocation.
作者 黄畅 郭文忠 郭昆 HUANG Chang;GUO Wenzhong;GUO Kun(College of Mathematics and Computer Sciences,Fuzhou University,Fuzhou 350116,China;Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou University,Fuzhou 350116,China;Key Laboratory of Ministry of Education for Spatial Data Mining & Information Sharing,Fuzhou University,Fuzhou 350116,China)
出处 《计算机科学与探索》 CSCD 北大核心 2019年第7期1102-1113,共12页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金Nos.61300104,61300103,61672158 福建省高校杰出青年科学基金No.JA12016 福建省高等学校新世纪优秀人才支持计划No.JA13021 福建省杰出青年科学基金Nos.2014J06017,2015J06014 福建省科技创新平台计划项目Nos.2009J1007,2014H2005 福建省自然科学基金Nos.2013J01230,2014J01232 福建省高校产学合作项目Nos.2014H6014,2017H6008~~
关键词 热点话题发现 微博 突发词对主题模型(BBTM) 主题模型 hot topic detection microblog bursty biterm topic model (BBTM) topic model
  • 相关文献

参考文献10

二级参考文献133

  • 1张启蕊,张凌,董守斌,谭景华.训练集类别分布对文本分类的影响[J].清华大学学报(自然科学版),2005,45(S1):1802-1805. 被引量:26
  • 2曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报(自然科学版),2004,32(z1):99-102. 被引量:27
  • 3骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 4苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 5Kang J H, Lerman K, Plangprasopchok A. Analyzing Microblogs with affinity propagation [C] //Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010:67-70.
  • 6Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models [C] //Proc of Int AAAI Conf on Weblogs and Social Media. Menlo Park, CA: AAAI, 2010:130-137.
  • 7Xu R, Wunsch D. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 8Deerwester S, Dumais S, Landauer T, et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990, 41(6): 391-407.
  • 9Landauer T K, Foltz P W, Laham D. Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25 (2) 259-284.
  • 10Griffiths T, Steyvers M. Probabilistic topic models [G] // Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.

共引文献504

同被引文献42

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部