期刊文献+

基于突发主题词和凝聚式层次聚类的微博突发事件检测研究 被引量:7

A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm
原文传递
导出
摘要 【目的】实时、准确、高效地检测出海量微博中的突发事件,为舆情应急管理提供重要的决策信息支持。【方法】引入参照时间窗机制,设计词频、文档频率、话题标签(Hashtag)、词频增长率4类特征的选择与计算方法,基于动态阈值实现对突发主题词的抽取。在此基础上,将微博文本表示为突发主题词的特征向量,使用凝聚式层次聚类算法实现了突发事件的检测。【结果】将实验结果结合实例进行分析,突发事件检测达到80%的准确率,验证该方法的可行性和有效性。【局限】由于语料数据和研究范围的限制,还未实现对所检测突发事件的自动描述,对网民情感、事件间语义关系等要素的分析及考量也存在一定欠缺。【结论】本研究突破以往相关研究中文本内容质量、文本形式、突发特征抽取结果的局限,提升微博突发事件检测的效率。 [Objective] This paper proposes a new method to detect real time bursty events accurately and efficiently from massive micro-blog posts. It provides decision-making intbrmation to public opinion emergency management. [Methods] First, we introduced the reference time window mechanism, and then designed an algorithm to process the data of word frequency, document frequency, Hashtags, and word frequency growth rates. Second, used this dynamic threshold based algorithm to extract bursty words. Third, transformed micro-blog texts to feature vector of the bursty words. Finally, we detected the bursty events using agglomerative hierarchical clustering algorithm. [Results] The bursty events detection method reached 80% of accuracy rate compared with real world cases. Thus, the proposed method was feasible and effective. [Limitations] We could not describe the detected emergencies automatically due to the limits of data and size of the current study. More research is needed to analyze users' emotion and semantic relationships among the bursty events. [Conelusions] Our study fills the knowledge gaps left by previous research, and improves the efficiency of retrieving bursty events from micro-blog posts.
出处 《现代图书情报技术》 CSSCI 2016年第7期12-20,共9页 New Technology of Library and Information Service
基金 国家社会科学基金项目"基于社会网络分析的网络舆情主题发现研究"(项目编号:15BTQ063) 国家社会科学基金重点项目"大数据环境下社会舆情与决策支持方法体系研究"(项目编号:14AZD084)的研究成果之一 中央高校基本科研业务费专项资金资助项目"大数据时代基于深度融合的创新型知识服务体系及其运行机制研究"(项目编号:30916011330)
关键词 突发事件检测 突发主题词 凝聚式层次聚类 网络舆情 微博 Bursty events detection Bursty topic words Agglomerative hierarchical clustering algorithm Public opinion Micro-blog
  • 相关文献

参考文献3

二级参考文献44

  • 1耿焕同,蔡庆生,赵鹏,于琨.一种基于词共现图的文档自动摘要研究[J].情报学报,2005,24(6):651-656. 被引量:15
  • 2中国互联网信息中心.第30次中国互联网络发展状况统计报告[R].2012.
  • 3MORI M, MIURA T, SHIOYA I. Topic detection and tracking for news web pages[C]//Proceedings of the 2006 ACM International Conference on Web Intelligence. Washington, DC, USA, 2006: 338-342.
  • 4ALLAN J, CARBONELL J, DODDINGTON G, et al. Topic detection and tracking pilot study: final report[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. San Francisco, USA: Morgan Kaufmann Publisher Inc, 1998: 194-218.
  • 5LIU Zitao, YU Wenchao, CHEN Wei, et al. Short text feature selection for microblog mining[C]//The 4th International Conference on Computational Intelligence and Software Engineering. Wuhan, China, 2010: 1-4.
  • 6张华平.NLPIR微博内容语料库-23万条[EB/OL]. (2012-02-14)[2012-05-20]. http://www.nlpir.org/?actionviewnewsitemid231.2012,02,14/2012,02,18.
  • 7张华平.ICTCLAS2012版本SDK发布(u0106版本修正了UTF8下的bug)[EB/OL]. (2011-12-31)[2012-05-20]. http://www.nlpir.org/?actionviewnewsitemid229.2011,12,31/2012,02,18.
  • 8TRIVISON D. Term cooccurrence in cited/citing journal articles as a measure of document similarity[J]. Information Processing & Management, 1987, 23(3): 183-194.
  • 9耿焕同,蔡庆生,于琨,等.一种基于词共现图的文档主题词自动抽取算法[J].南京大学学报:自然科学, 2006, 42(2): 156-162.
  • 10Diao Q M, Jiang J, Zhu F D. Finding Bursty Topics from Microblogs[C].In: Proceedings of ACL, 2012:536-544.

共引文献68

同被引文献99

引证文献7

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部