摘要
主流的热点追踪算法都采用文本聚类技术来实现,在处理海量网页时,很难精准聚合中心结果,离需要的热点差距太远.现有的网络舆情系统监控的范围受限于使用者给出的关键词,使系统无法监测使用者未知的突发事件.针对网络舆情发生和传播特点,改善舆情信息采集策略;网络舆情的相关页面标题文字主题鲜明,据此提出自动挖掘热点关键词并根据关键词进行话题聚类的方法;根据新闻、论坛和博客的不同特点分别设计网络舆情热点分析模型.在此基础上,设计并实现了一个网络舆情监测系统.系统实际运行表明,该方案可以及时发掘热点话题并对突发事件实时追踪监测.
The main algorithms for hotspot tracking adopt the Text Clustering technology.When dealing with mass web pages,it is difficult to cluster the expected hotspot.Clustering causes huge central bias.The w orking range of current Internet public opinion monitoring and w arning system is limited by the keyw ords given by the user,thus causing the system not to detect those unexpected events.Based on the characteristics of Internet public opinion occurrence and its spreading,the information acquisition strategy is improved;according to the distinct themes of the titles of the related Internet public opinion's w eb texts,to pursue hotspot keyw ords automatically and to conduct topic clustering based on keyw ord is proposed;based on the different features of new s,forums and blogs,hotspot analysis models of Internet public opinion are designed respectively;On this basis,an internet public opinion monitoring system is designed and implemented.The running tests show that this scheme is capable of finding hot topics timely and tracking realtime emergent events.
出处
《小型微型计算机系统》
CSCD
北大核心
2013年第3期471-474,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60873203)资助
关键词
网络舆情
话题聚类
热点话题
追踪监测
internet public opinion
topic clustering
hot topic
tracking