期刊文献+

基于Single-Pass的在线话题检测改进算法 被引量:4

An Improved Algorithm Based on Single-Pass for Online Topic Detection
下载PDF
导出
摘要 现有话题检测的主要方法是利用Single-Pass及其改进算法进行聚类分析,没有考虑文本的结构特点,相似度计算方法单一,从而影响准确度.针对此问题,改进了Single-Pass的相似度计算方法,综合考虑文本的标题、摘要、时间、地名以及来源等要素,采用层次分析法计算并赋以不同权重,提出一种多相似度计算组合策略.考虑到食品安全是一个广受关注的话题,实验通过网络爬虫抓取并筛选了最近3年食品安全方面的媒体信息,以此作为数据进行分析,结果表明,采用本文提出的改进Single-Pass聚类算法,话题检测准确度更高. At present, the main research method of existing topic detection is to use Single-Pass and its improved algorithmfor clustering analysis. However, these algorithms use a single similarity calculation method without considering the struc-tural characteristics of the text, which affects the clustering accuracy. This research hasimproved the similarity calculation method of Single-Pass and proposed a multi-similarity computation combination strategy which toke the title, abstract, time,place names and source into consideration, and used the analytic hierarchy process to calculate and assign them differentweights. As food safety is a widely concerned topic, we analyzed the data about food safety in the last three years which we could get with the web crawler. The results show that the improved Single-Pass clustering algorithm proposed in this paper has a higher topic detection accuracy.
作者 马永军 刘洋 李亚军 汪睿 MA Yongjun;LIU Yang;LI Yajun;WANG Rui(College of Computer Science and Information Engineering, Tianjin University of Science & Technology, Tianjin 300457, China;Food Safety Management and Strategic Research Center, Tianjin University of Science & Technology, Tianjin 300222, China)
出处 《天津科技大学学报》 CAS 北大核心 2017年第6期73-78,共6页 Journal of Tianjin University of Science & Technology
基金 天津市教委重大项目(2014ZD22) 天津市应用基础与前沿技术研究计划(14JCQNJC00300)
关键词 网络舆情 Single-Pass 相似度计算 食品安全 internet public opinion Single-Pass similarity calculation food safety
  • 相关文献

参考文献7

二级参考文献69

共引文献260

同被引文献39

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部