期刊文献+

基于改进隐式狄利克雷分布算法的新浪微博话题检测

Topic Detection from Sina Microblog Based on Improved Latent Dirichlet Allocation Algorithm
下载PDF
导出
摘要 提出了一个基于改进主题模型的微博话题检测算法。由于传统的主题模型主要适用于传统媒体文字,对短文本的处理效果不是很好。针对微博文本特有的数据结构,在文本聚类时先加入转发特征以及评论特征权重预处理方法。在此基础上,改进了传统的隐含狄利克雷分布主题模型用来提取热点微博数据中的主题。实验证明,与传统相比方法,改进的主题模型解决了传统主题检测方法在应用于短文本时存在的高维稀疏问题。 This paper presents a topic detection algorithm from microblog based on improved topic model.Because the traditional topic model is mainly applied to traditional media text,the results of applied to short text is not effective.For the data structure of microblogging text,this paper puts forward the forwarding feature and the feature weight preprocessing methodapproach during the text clustering step.Besides,this paper improves the latent dirichlet allocation methodapproach and applied it to extract the topics of microblog.
出处 《工业控制计算机》 2017年第12期37-38,41,共3页 Industrial Control Computer
关键词 隐含狄利克雷分布 新浪微博 话题检测 latent dirichlet allocation sina weibo topic detection
  • 相关文献

参考文献6

二级参考文献68

  • 1吴友政,赵军,徐波.基于主题语言模型的句子检索算法[J].计算机研究与发展,2007,44(2):288-295. 被引量:8
  • 2夏云庆,黄锦辉,张普.中文网络聊天语言的奇异性与动态性研究[J].中文信息学报,2007,21(3):83-91. 被引量:8
  • 3车万翔,刘挺,李生.自动浅层语义分析[C].中国中文信息学会二十五周年学术会议,2006.
  • 4Higashinaka R, Isozaki H. Corpus-based question answering for why-questions [C]//Proc of IJCNLP'08. Hyderabad, India: AFNLP, 2008:418-425.
  • 5Fellbaum C. WordNet: An Electronic Lexical Database [M]. Cambridge: MIT Press, 1998.
  • 6Hirschman L, Light M, Breck E, et al. Deep read: A reading comprehension system [C] //Proc of ACL 1999. College, Park, Maryland: Association for Computational Linguistics, 1999 : 325-332.
  • 7Clarke C L A, Cormack G V, Lynam T R. Exploiting redundancy in question answering [C] //Proc of SIGIR 2001. New York, ACM, 2001, 358-365.
  • 8Jijkoun V, De Rijke M. Retrieving answers from frequently asked questions pages on the Web [C] //Proe of CIKM-2005. New York: ACM, 2005:76-83.
  • 9Ng H T, Teo L H, Lai J, et al. A machine learning approach to answering questions for reading comprehension tests [C] //Proc of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Hong Kong:Association for Computational Linguistics, 2000: 124-132.
  • 10Xu Kui, Meng H. Using verb dependency matching in a reading comprehension system [G]. //LNCS 3411: Proc of AIRS 2004. Berlin: Springer, 2004:190-201.

共引文献406

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部