期刊文献+

MB-SinglePass:基于组合相似度的微博话题检测 被引量:24

MB-SinglePass:Microblog Topic Detection Based on Combined Similarity
下载PDF
导出
摘要 话题检测技术在传统媒体的研究中取得了较好的效果。探讨了针对微博类的新型媒体短文本对象话题检测技术的优化及性能评价。基于微博中联系人存在的关注和粉丝等结构化信息、帖子之间转发评论等内在关联关系,提出了针对微博的MB-SinglePass话题检测算法。该算法除了考虑微博上述特点之外,还针对短文本特征稀疏的问题,利用同义词典,引入了微博特征扩展技术,丰富了特征信息。同时,针对单一使用余弦相似度、雅各比相似度和语义相似度的不足,采用了组合相似度策略。相较传统算法,MB-SinglePass算法在新浪微博实测数据集上取得了更好的性能。另外,针对相似度策略的对照实验说明采用组合相似度的效果优于单一相似度。 Topic detection achieves quite good result in the traditional media research. This paper discussed the refiness and performance evaluation of the topic detection technique in the new kind of medias such as microblog, proposed the MB-SinglePass topic detection algorithm on the basis of the structured information such as the relationships of attentions and fans between contacts, the inner connection relationships such as forwarding and comment between posts. Beside considering the above microblog characteristics, MB-SinglePass introduces the characteristics extension technique in order to enrich characteristics information. At the same time, the paper used the combined similarity aiming at the shortage of singly utilizing the Jaccard similarity coefficient, cosine based similarity and semantic similarity. Compared with the traditional algorithms, MB-SinglePass shows better performance on the actual dataset of sina microblog. Additional ly, experiment according to the similarity strategy reveals better result by using combined similarity than singular similariy.
出处 《计算机科学》 CSCD 北大核心 2012年第10期198-202,共5页 Computer Science
基金 软件开发环境国家重点实验室开放课题(SKLSDE-2011KF-06) 国家高技术研究发展(863)计划(2009AA043303)资助
关键词 微博 SinglePass 话题检测 文本相似度 同义词扩展 Microblog, SinglePass, Topic detection, Text similarity, Synonyms extension
  • 相关文献

参考文献5

二级参考文献98

共引文献227

同被引文献270

引证文献24

二级引证文献108

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部