期刊文献+

改进的话题检测和跟踪算法研究 被引量:3

Improved Algorithm Study on Topic Detection and Tracking
下载PDF
导出
摘要 话题检测可以及时发现互联网舆情热点和突发性事件,并可对话题进行持续跟踪,从而实时掌握舆情事件动向。文中提出了一种基于聚类的改进话题检测和跟踪算法。首先,对文本的特征向量进行改进,增加了基于句子主干的主干向量。然后对每个检测到的话题提取两个中心向量,一个是基本中心向量,另一个是基于主干向量提炼的主干中心向量。在此基础上再通过计算每个文本与中心向量之间的距离进行聚类分析,保证话题中各个文本之间的内聚性。同时基于主题词抽取,在主题词的基础上计算话题之间的主题相关性,有效地实现了子话题检测功能,从而提高了话题检测和跟踪的准确性。通过对10大网站5个频道超过两周数据量的测试,结果表明此方法在一定程度上提高了话题检测和跟踪的正确率,并具有一定的适应性和推广性。 The topic detection can detect hot Internet public opinion and emergencies, and can carry out the continuous tracking of the topic, which can get a real-time grasp of public opinion trends. Propose an improved algorithm for detecting and tracking based on topic clustering in this paper. First, to improve the feature vectors of document, increase the backbone vectors based on sentence trunk. Then two center vectors are extracted from each detected topic ,in which one is the basic center vector and another is the main center vector. On this basis, by calculating the distance between the document vector and the corresponding center vector, the cluster analysis is performed to en- sure the cohesion of each document for the same topic. Meanwhile, based on keyword extraction, the theme correlation between different topics is calculated to improve the accuracy of topic detection and tracking. Taking the top 10 sites 5 channel data for more than two weeks as the test data, the experimental results show that this method improves the accuracy of topic detection and tracking to some extent, and has certain adaptability and generalization.
作者 肖红 许少华
出处 《计算机技术与发展》 2014年第9期84-88,共5页 Computer Technology and Development
基金 国家自然科学基金资助项目(61170132)
关键词 话题检测和跟踪 聚类算法 特征向量 网络舆情 topic detection and tracking cluster algorithm feature vector network public opinion
  • 相关文献

参考文献13

二级参考文献78

共引文献108

同被引文献29

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部