摘要
微博话题随着移动互联网的发展变得火热起来,单个热门话题可能有数万条评论,微博话题的立场检测是针对某话题判断发言人对该话题的态度是支持的、反对的或中立的.本文一方面由Word2Vec训练语料库中每个词的词向量获取句子的语义信息,另一方面使用Text Rank构建主题集作为话题的立场特征,同时结合情感词典获取句子的情感信息,最后将特征选择后的词向量使用支持向量机对其训练和预测完成最终的立场检测模型.实验表明基于主题词及情感词相结合的立场特征可以获得不错的立场检测效果.
With the development of the mobile Internet, Microblog topic has become popular. A single hot topic may have tens of thousands of comments. The stance detection of Microblog topic aims to automatically determine whether the author of a text is in favor of the given target, against the given target, or neither. Firstly, Word2 Vec trains out each word of the corpus of vector to extract semantics information from sentence. Then, Text Rank keywords extracted method is used to construct the thematic words set as the stance's feature, meanwhile, the sentiment lexicon is used to extract the sentiment information of the sentence. Finally, the word vector of feature selection is trained and predicted by Support Vector Machine(SVM), so as to complete the model of stance detection. The experimental result shows that the stance feature based on the combination of thematic words and sentiment words can obtain good stance detection effect.
作者
郑海洋
高俊波
邱杰
焦凤
ZHENG Hai-Yang;GAO Jun-Bo;QIU Jie;JIAO Feng(College of Information Engineering,Shanghai Maritime University,Shanghai 201306,China)
出处
《计算机系统应用》
2018年第9期118-123,共6页
Computer Systems & Applications