摘要
【目的】实时、准确、高效地检测出海量微博中的突发事件,为舆情应急管理提供重要的决策信息支持。【方法】引入参照时间窗机制,设计词频、文档频率、话题标签(Hashtag)、词频增长率4类特征的选择与计算方法,基于动态阈值实现对突发主题词的抽取。在此基础上,将微博文本表示为突发主题词的特征向量,使用凝聚式层次聚类算法实现了突发事件的检测。【结果】将实验结果结合实例进行分析,突发事件检测达到80%的准确率,验证该方法的可行性和有效性。【局限】由于语料数据和研究范围的限制,还未实现对所检测突发事件的自动描述,对网民情感、事件间语义关系等要素的分析及考量也存在一定欠缺。【结论】本研究突破以往相关研究中文本内容质量、文本形式、突发特征抽取结果的局限,提升微博突发事件检测的效率。
[Objective] This paper proposes a new method to detect real time bursty events accurately and efficiently from massive micro-blog posts. It provides decision-making intbrmation to public opinion emergency management. [Methods] First, we introduced the reference time window mechanism, and then designed an algorithm to process the data of word frequency, document frequency, Hashtags, and word frequency growth rates. Second, used this dynamic threshold based algorithm to extract bursty words. Third, transformed micro-blog texts to feature vector of the bursty words. Finally, we detected the bursty events using agglomerative hierarchical clustering algorithm. [Results] The bursty events detection method reached 80% of accuracy rate compared with real world cases. Thus, the proposed method was feasible and effective. [Limitations] We could not describe the detected emergencies automatically due to the limits of data and size of the current study. More research is needed to analyze users' emotion and semantic relationships among the bursty events. [Conelusions] Our study fills the knowledge gaps left by previous research, and improves the efficiency of retrieving bursty events from micro-blog posts.
出处
《现代图书情报技术》
CSSCI
2016年第7期12-20,共9页
New Technology of Library and Information Service
基金
国家社会科学基金项目"基于社会网络分析的网络舆情主题发现研究"(项目编号:15BTQ063)
国家社会科学基金重点项目"大数据环境下社会舆情与决策支持方法体系研究"(项目编号:14AZD084)的研究成果之一
中央高校基本科研业务费专项资金资助项目"大数据时代基于深度融合的创新型知识服务体系及其运行机制研究"(项目编号:30916011330)
关键词
突发事件检测
突发主题词
凝聚式层次聚类
网络舆情
微博
Bursty events detection Bursty topic words Agglomerative hierarchical clustering algorithm Public opinion Micro-blog