期刊文献+

微博话题识别中基于动态共词网络的文本特征提取方法 被引量:13

A Feature Selection Method based on Dynamic Co-word Network for Microblog Topic Detection
下载PDF
导出
摘要 本文针对微博文本的简短、动态性等特征,提出一种新的文本特征提取方法,提升微博话题识别任务中文本聚类算法效果。利用词项共现的思想,针对微博时序文本构建动态共词网络。在动态共词网络中,边权重随着时间推移而线性衰减,并在此基础上利用网络的度中心性计算微博文本特征权重。从新浪微博中采样构建实验数据集进行实验,结果表明动态共词网络特征提取方法相较于文档频率方法,更适宜于提取微博文本特征,能取得更好的微博话题识别效果。 The texts of microblog have some special characteristics, such as short and dynamic, which calls for new feature selection methods that are suitable for clustering algorithms to detect the topics from microblog texts. To address this problem, this paper utilizes the idea of co-occurrence to build the dynamic co-word network for microblog texts in timelines. In the dynamic co-word network, edge weights are decayed linearly over time. Then, the weights of text features are calculated according to the degree centrality measure of the network. The experiments are carried out on datasets that are sam- pled from Sina Weibo. It' s shown that the dynamic co-word network feature selection method is more suitable for extracting features of microblog texts and achieves better microblog topic detection over the conventional document frequency method.
出处 《图书情报知识》 CSSCI 北大核心 2016年第3期80-88,共9页 Documentation,Information & Knowledge
基金 国家社会科学基金项目“基于信任的网络社区口碑信息传播模式及其演化研究”(12CTQ044)的成果之一
关键词 微博 话题识别 动态共词网络 特征提取 文本聚类 Microblog Topic detection Dynamic co-word network Feature selection Text clustering
  • 相关文献

参考文献22

二级参考文献203

共引文献266

同被引文献258

引证文献13

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部