期刊文献+

基于突发词对主题模型改进算法的微博热点话题发现研究 被引量:5

Research on Microblog Hot Topic Discovery Based on the Improved BBTM Algorithm
下载PDF
导出
摘要 [研究目的]针对主流话题发现模型存在数据稀疏、维度高等问题,提出了一种基于突发词对主题模型(BBTM)改进的微博热点话题发现方法(BiLSTM-HBBTM),以期在微博热点话题挖掘中获得更好的效果。[研究方法]首先,通过引入微博传播值、词项H指数和词对突发概率,从文档层面和词语层面进行特征选择,解决数据稀疏和高维度的问题。其次,通过双向长短期记忆(BiLSTM)训练词语之间的关系,结合词语的逆文档频率作为词对的先验知识,考虑了词之间的关系,解决忽略词之间关系的问题。再次,利用基于密度的方法自适应选择BBTM的最优话题数目,解决了传统的主题模型需要人工指定话题数目的问题。最后,利用真实微博数据集在热点话题发现准确度、话题质量、一致性三个方面进行验证。[研究结论]实验表明,BiLSTM-HBBTM在多种评价指标上都优于对比模型,实验结果验证了所提模型的有效性及可行性。 [Research purpose]Aiming at the problems of sparse data and high dimension in mainstream topic discovery model,this paper proposes an improved microblog hot topic discovery method(BiLSTM-HBBTM)based on the bursty biterm topic model(BBTM),in order to get better performances in microblog hot topic mining.[Research method]First,microblog propagation value,H index of term and bursty probability of biterm are used to select characteristics.The characteristics selection is carried out from the document level and the word level to solve the problem of data sparsity and high dimension.Second,through the Bi-directional long-short term memory(BiLSTM)training,the relationship between words,combined with the inverse document frequency of words as the prior knowledge of biterms,the relationship between words is considered and solve the problem of ignoring the relationship between words.Third,a density based method is used to select optimal number of topics for the BBTM model,which solves the problem that the traditional topic model needs to manually specify the number of topics.Finally,the actual datasets are used to verify the accuracy of hot topic discovery,topic quality and consistency.[Research conclusion]The experiment shows that BiLSTM-HBBTM is better than the contrast model in a variety of evaluation indicators,and the experimental results have verified the effectiveness and feasibility of the model.
作者 向卓元 吴玉 陈浩 张芙玮 Xiang Zhuoyuan;Wu Yu;Chen Hao;Zhang Fuwei(School of Information and Safty Engineering, Zhongnan University of Economics and Law,Wuhan 430073)
出处 《情报杂志》 CSSCI 北大核心 2022年第1期104-112,共9页 Journal of Intelligence
基金 国家自然科学基金面上项目“面向跨语言观点摘要的领域知识表示与融合模型研究”(编号:71974202)研究成果之一。
关键词 热点话题发现 主题模型 微博 短文本 BiLSTM BBTM Word2Vec hot topic discovery topic model microblog short texts BiLSTM BBTM Word2Vec
  • 相关文献

参考文献10

二级参考文献57

  • 1姜吉发.一种跨语句汉语事件信息抽取方法[J].计算机工程,2005,31(2):27-29. 被引量:12
  • 2Blei D, Ng A, Jordan M. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3:993-1022
  • 3Blei D, Lafferty J. Correlated topic models//Weiss Y, Seholkopf B, Platt J eds. Advances in Neural Information Processing Systems 18. Cambridge, MA: MIT Press, 2006
  • 4Li W, McCallum A. Pachinko allocation: DAG-struetured mixture models of topic correlations//Proceedings of the International Conference on Machine Learning (ICML). Pittsburgh, Pennsylvania, 2006: 577-584
  • 5Xing E, Yan R, Hauptmann A. Mining associated text and images with dual-wing harmoniums//Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI-05). Edinburgh, Scotland, 2005:633-641
  • 6Li F-F, Perona P. A bayesian hierarchical model for learning natural scene categories//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA, 2005: 524-531
  • 7Wei X, Croft W B. LDA-based document models for ad-hoc retrieval/ /Proceedings of the 29th SIGIR Conference. 2006: 178-185
  • 8Deerwester S, Dumais S, Furnas G, Lanouauer T, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41: 391- 407
  • 9Hofmann T. Probabilistic latent semantic indexing//Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, CA, USA, 1999:50-57
  • 10Yang J, Liu Y, Xing E P, Hauptmann A. Harmonium-based models for semantic video representation and classification// Proeeedings of the 7th SIAM International Conferenee on Data Mining. Minneapolis, MN, 2007

共引文献175

同被引文献66

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部