期刊文献+

敏感话题发现中的增量型文本聚类模型 被引量:6

A Study on Incremental Text Clustering in Sensitive Topic Detection
下载PDF
导出
摘要 面对网络上更新快速的海量新闻,如何快速、有效地从中自动发现敏感话题并进行持续跟踪是当下研究的热点。文章以网络舆情分析系统为应用背景,针对其敏感话题发现过程,通过对TDT领域应用较多的Single-pass算法进行改进,提出了一种基于相似哈希的增量型文本聚类算法。基于实际应用中抓取到的新闻文本数据,实验结果表明,文章提出的算法相比于原Single-pass算法在聚类效率方面具有明显提升。从实际应用的效果来看,该算法达到了实时话题发现的预期需求,具有较高的实用价值。 Faced with the huge amounts of news data which updated on the Internet all the time, Sensitive Topic Detection and Tracking has become an important research now. In this paper, we discuss and research the incremental text clustering algorithm for sensitive topic detection in a online consensus analysis system. We introduce the related work of text clustering. Based on the Single-pass algorithm, we improve its performance and propose a new incremental text clustering algorithm which based on simhash. Based on the real online news corpus from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meet the real-time demand of online topic detection and has a certain practical value.
作者 张越今 丁丁
出处 《信息网络安全》 2015年第9期170-174,共5页 Netinfo Security
关键词 敏感话题发现 相似哈希 增量文本聚类 Single—pass sensitive topic detection Simhash incremental text clustering Single-pass
  • 相关文献

参考文献9

二级参考文献72

共引文献138

同被引文献74

引证文献6

二级引证文献82

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部