摘要
[目的/意义]准确地计算微博相似度可以提高微博主题挖掘效率,对舆情治理、保障信息安全具有实践意义。针对微博文本语义稀疏、高维的问题,提出一种融入微博非文本特征的超边相似度算法。[方法/过程]分析微博舆情发生机制,利用超网络模型表示微博舆情主题形成过程,通过计算各层子网相似度及各层子网对主题形成的贡献度构建超边相似度算法。[结果/结论]研究发现,论文所提出的相似度方法有助于提升微博舆情信息的主题聚类效果,特别是对于文字性表述相似程度高的微博信息,具有明显的主题区分性。
[Purpose/significance]Accurate calculation of microblog similarity can improve the efficiency of microblog topic mining,and has practical significance for public opinion governance and information security.Aiming at the problem of sparse and high-dimensional microblog text,this paper proposes a super-edge similarity algorithm incorporating non-text features of microblog.[Method/process]The mechanism of microblog public opinion was analyzed,and the formation of microblog public opinion topic formation were expressed by super network model,and the algorithm of super-edge similarity was constructed by calculating the similarity of each subnet layer and the contribution of each subnet layer to the topic formation.[Result/conclusion]It was found that the similarity method proposed in this paper is helpful to improve the topic clustering effect of microblog public opinion information.Especially for micro blog with high similarity of literal expression,it has obvious subject differentiation.
作者
梁晓贺
田儒雅
吴蕾
张学福
Liang Xiaohe;Tian Ruya;Wu Lei;Zhang Xuefu(Agricultural Information Institute of Chinese Academy of Agricultural Sciences,Beijing 100081)
出处
《图书情报工作》
CSSCI
北大核心
2020年第11期77-86,共10页
Library and Information Service
基金
中国农业科学院科技创新工程项目"科技情报分析与评估创新团队"(项目编号:CAAS-ASTIP-2016-AII)
中国农业科学院农业信息研究所基本科研业务费项目"基于加权策略的大数据微博突发舆情主题挖掘"(项目编号:JBYW-AII-2017-29)研究成果之一。
关键词
超边相似度
主题发现
超网络
微博
super-edge similarity
topic detection
super network
microblog