期刊文献+

基于MapReduce的大规模话题网络提取分析

Analysis of Large-scale Topic Network Extraction Based on MapReduce
下载PDF
导出
摘要 微博作为信息发布和获取的重要手段,已成为最重要的媒体之一。用户每天在微博上发言,其内容隐含着许多重要话题。在话题检测过程中,话题网络构建是一项最基本的内容。将微博发言作为节点,将节点间包含共同的词汇作为边,应用MapReduce编程模型作为海量数据处理的平台,研究了微博信息中大规模话题网络的构建方法。实验表明,基于MapReduce构建的话题网络符合社会网络的相关性质,并且其话题预测的准确性也高于基于LDA模型的话题检测。 Microblog,as a new way of information sharing and acquiring,has become one of the most important media.Everyday people post on Microblog,and these posts contain many hot topics.During topic detection,construction of topic network is a basic step.Consuming posts as nodes,and common words between two nodes as edges,this paper applies MapReduce as the platform of massive data processing,and studies how to construct topic network in Microblog. Experiments show that the topic network constructed by our MapReduce-based method conforms to the related attributes of social network,and the accuracy of topic detection based on our net-work is better than the LDA-based topic detection.
作者 刘热
出处 《淮海工学院学报(自然科学版)》 CAS 2014年第2期40-44,共5页 Journal of Huaihai Institute of Technology:Natural Sciences Edition
基金 无锡科技职业学院政产学研合作(共推互聘)科技项目(SG14029)
关键词 话题网络 提取 MAPREDUCE 算法 topic networks extraction MapReduce algorithm
  • 相关文献

参考文献21

  • 1SOOP M, FRYKSMARK U, KOSTER M, et al. The incidence of adverse events in Swedish hospitals: a ret- rospective medical record review study[J]. Interna- tional Journal for Quality in Health Care, 2009, 21 (4) : 285-291.
  • 2ZHU Xingwei, MING Zhaoyan, ZHU Xiaoyan, et al. Topic hierarchy construction for the organization of multi-source user generated contents[C]//Proceedings of the 36th International ACM SIC-IR Conference on Research and Development in Information Retrieval. ACM, 2013: 233-242.
  • 3ALLAN J, PAPKA R, LAVRENKO V. On-line new event detection and tracking [C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 37-45.
  • 4DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107-113.
  • 5SHVACHKO K, KUANG H R, RADIA S, et al. The hadoop distributed file system[C]//Mass Storage Systems and Technologies, 2010 IEEE 26th Symposi- um on IEEE, 2010: 1-10.
  • 6BECKER J, KUROPKA D. Topic-based vector space model[C]//Proceedings of the 6th International Con- ference on Business Information Systems, 2003: 7-12.
  • 7BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 8ALLAN J, WADE C, BOLIVAR A. Retrieval and novelty detection at the sentence level[C]//Proeeed- ings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval. ACM, 2003: 314-321.
  • 9HE Qi, CHANG Kuiyu, LIME P, et al. Keep it sim- ple with time: a reexamination of probabilistic topic detection models[J]. IEEE Transactions on Pattern A- nalysis and Machine Intelligence, 2010, 32(10): 1795- 1808.
  • 10HUANG Bo, YANG Yan, MAHMOOD A, et al. Microblog topic detection based on LDA model and single-pass clustering[C]//Rough Sets and Current Trends in Computing. Berlin and Heidelberg: Springer, 2012: 166-171.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部